提交 · 63c36c40b9b031b760f89f5991843b6eeb6314e7 · openeuler / raspberrypi-kernel

08 12月, 2016 21 次提交

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 63c36c40

由 David S. Miller 提交于 12月 07, 2016

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-12-07

This series contains updates to i40e and i40evf only.

Filip modifies the i40e to log link speed change and when the link is
brought up and down.

Mitch replaces i40e_txd_use_count() with a new function which is slightly
faster and better documented so the dim witted can better follow the
code.  Fixes the locking of the service task so that it is actually
done in the service task and not in the scheduling function which calls
the service task.

Jacob, being the busy little beaver he is, provides most of the changes
starting restores a workaround that is still needed in some configurations,
specifically the Ethernet Controller XL710 for 40GbE QSFP+.  Removes
duplicate code and simplifies the i40e_vsi_add_vlan() and
i40e_vsi_kill_vlan() functions.  Removes detection of PTP frames over L4
(UDP) on the XL710 MAC, since there was a product decision to defeature
it.  Fixed a previous refactor of active filters which caused issues in
the accounting of active_filters.  Remaining work was done in the VLAN
filters to improve readability and simplify code as much as possible
to reduce inconsistencies.

Alex fixes foul budget accounting in core code by returning actual
work done, capped to budget-1.

Henry fixes the "ethtool -p" function for 1G BaseT PHYs.

Carolyn adds support for 25G devices for i40e and i40evf.

Michal adds functions to apply the correct access method for external PHYs
which could use Clause22 or Clause45 depending on the PHY.

v2: dropped last patch from previous series, since changes are needed based
    on feedback from Sergei Shtylyov
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63c36c40

dummy: expend mtu range for dummy device · 25e3e84b

由 Zhang Shengju 提交于 12月 07, 2016

After commit 61e84623 ("net: centralize net_device min/max MTU checking"),
the mtu range for dummy device becomes [68, 1500].

This patch extends it to [0, 65535].
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25e3e84b

nlmon: use core MTU range checking in nlmon driver · e82621e3

由 Zhang Shengju 提交于 12月 07, 2016

Since commit 61e84623 ("net: centralize net_device min/max MTU checking"),
mtu range is checked at dev_set_mtu().

This patch adds min_mtu for nlmon device and remove unnecessary
ndo_change_mtu() function.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e82621e3

driver: macvlan: Remove the rcu member of macvlan_port · a1f5315c

由 Gao Feng 提交于 12月 07, 2016

When free macvlan_port in macvlan_port_destroy, it is safe to free
directly because netdev_rx_handler_unregister could enforce one
grace period.
So it is unnecessary to use kfree_rcu for macvlan_port.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1f5315c

driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu · 48140a21

由 Gao Feng 提交于 12月 07, 2016

There are two functions which would free the ipvl_port now. The first
is ipvlan_port_create. It frees the ipvl_port in the error handler,
so it could kfree it directly. The second is ipvlan_port_destroy. It
invokes netdev_rx_handler_unregister which enforces one grace period
by synchronize_net firstly, so it also could kfree the ipvl_port
directly and safely.

So it is unnecessary to use kfree_rcu to free ipvl_port.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48140a21

bpf: fix loading of BPF_MAXINSNS sized programs · ef0915ca

由 Daniel Borkmann 提交于 12月 07, 2016

General assumption is that single program can hold up to BPF_MAXINSNS,
that is, 4096 number of instructions. It is the case with cBPF and
that limit was carried over to eBPF. When recently testing digest, I
noticed that it's actually not possible to feed 4096 instructions
via bpf(2).

The check for > BPF_MAXINSNS was added back then to bpf_check() in
cbd35700 ("bpf: verifier (add ability to receive verification log)").
However, 09756af4 ("bpf: expand BPF syscall with program load/unload")
added yet another check that comes before that into bpf_prog_load(),
but this time bails out already in case of >= BPF_MAXINSNS.

Fix it up and perform the check early in bpf_prog_load(), so we can drop
the second one in bpf_check(). It makes sense, because also a 0 insn
program is useless and we don't want to waste any resources doing work
up to bpf_check() point. The existing bpf(2) man page documents E2BIG
as the official error for such cases, so just stick with it as well.

Fixes: 09756af4 ("bpf: expand BPF syscall with program load/unload")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef0915ca

net: stmmac: do not call phy_ethtool_ksettings_set from atomic context · 90364fea

由 Niklas Cassel 提交于 12月 06, 2016

>From what I can tell, spin_lock(&priv->lock) is not needed, since the
phy_ethtool_ksettings_set call is not given the priv struct.

phy_start_aneg takes the phydev->lock. Calls to phy_adjust_link
from phy_state_machine also takes the phydev->lock.

[   13.718319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
[   13.726717] in_atomic(): 1, irqs_disabled(): 0, pid: 1307, name: ethtool
[   13.742115] Hardware name: Axis ARTPEC-6 Platform
[   13.746829] [<80110568>] (unwind_backtrace) from [<8010c2bc>] (show_stack+0x18/0x1c)
[   13.754575] [<8010c2bc>] (show_stack) from [<80433484>] (dump_stack+0x80/0xa0)
[   13.761801] [<80433484>] (dump_stack) from [<80145428>] (___might_sleep+0x108/0x170)
[   13.769554] [<80145428>] (___might_sleep) from [<806c9b50>] (mutex_lock+0x24/0x44)
[   13.777128] [<806c9b50>] (mutex_lock) from [<8050cbc0>] (phy_start_aneg+0x1c/0x13c)
[   13.784783] [<8050cbc0>] (phy_start_aneg) from [<8050d338>] (phy_ethtool_ksettings_set+0x98/0xd0)
[   13.793656] [<8050d338>] (phy_ethtool_ksettings_set) from [<80517adc>] (stmmac_ethtool_set_link_ksettings+0xa0/0xb4)
[   13.804184] [<80517adc>] (stmmac_ethtool_set_link_ksettings) from [<805c5138>] (ethtool_set_settings+0xd4/0x13c)
[   13.814358] [<805c5138>] (ethtool_set_settings) from [<805c9718>] (dev_ethtool+0x13c4/0x211c)
[   13.822882] [<805c9718>] (dev_ethtool) from [<805dc7c0>] (dev_ioctl+0x480/0x8e0)
[   13.830291] [<805dc7c0>] (dev_ioctl) from [<80260e34>] (do_vfs_ioctl+0x94/0xa00)
[   13.837699] [<80260e34>] (do_vfs_ioctl) from [<802617dc>] (SyS_ioctl+0x3c/0x60)
[   13.845011] [<802617dc>] (SyS_ioctl) from [<801088bc>] (__sys_trace_return+0x0/0x10)
Signed-off-by: NNiklas Cassel <niklas.cassel@axis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90364fea

Merge branch 'ti-cpts-update-and-fixes' · 64e8de58

由 David S. Miller 提交于 12月 07, 2016

Grygorii Strashko says:

====================
net: ethernet: ti: cpts: update and fixes

It is preparation series intended to clean up and optimize TI CPTS driver to
facilitate further integration with other TI's SoCs like Keystone 2.

Changes in v5:
- fixed copy paste error in cpts_release
- reworked cc.mult/shift and cc_mult initialization

Changes in v4:
- fixed build error in patch
  "net: ethernet: ti: cpts: clean up event list if event pool is empty"
- rebased on top of net-next

Changes in v3:
- patches reordered: fixes and small updates moved first
- added comments in code about cpts->cc_mult
- conversation range (maxsec) limited to 10sec

Changes in v2:
- patch "net: ethernet: ti: cpts: rework initialization/deinitialization"
  was split on 4 patches
- applied comments from Richard Cochran
- dropped patch
  "net: ethernet: ti: cpts: add return value to tx and rx timestamp funcitons"
- new patches added:
  "net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs"
  and "clocksource: export the clocks_calc_mult_shift to use by timestamp code"

Links on prev versions:
v4: https://lkml.org/lkml/2016/12/6/496
v3: https://www.spinics.net/lists/devicetree/msg153474.html
v2: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1282034.html
v1: http://www.spinics.net/lists/linux-omap/msg131925.html
====================
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64e8de58

net: ethernet: ti: cpts: fix overflow check period · 20138cf9

由 Grygorii Strashko 提交于 12月 06, 2016

The CPTS drivers uses 8sec period for overflow checking with
assumption that CPTS retclk will not exceed 500MHz. But that's not
true on some TI platforms (Kesytone 2). As result, it is possible that
CPTS counter will overflow more than once between two readings.

Hence, fix it by selecting overflow check period dynamically as
max_sec_before_overflow/2, where
 max_sec_before_overflow = max_counter_val / rftclk_freq.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20138cf9

net: ethernet: ti: cpts: calc mult and shift from refclk freq · 88f0f0b0

由 Grygorii Strashko 提交于 12月 06, 2016

The cyclecounter mult and shift values can be calculated based on the
CPTS rfclk frequency and timekeepnig framework provides required algos
and API's.

Hence, calc mult and shift basing on CPTS rfclk frequency if both
cpts_clock_shift and cpts_clock_mult properties are not provided in DT (the
basis of calculation algorithm is borrowed from
__clocksource_update_freq_scale() commit 7d2f944a ("clocksource:
Provide a generic mult/shift factor calculation")). After this change
cpts_clock_shift and cpts_clock_mult DT properties will become optional.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88f0f0b0

clocksource: export the clocks_calc_mult_shift to use by timestamp code · 5304121a

由 Murali Karicheri 提交于 12月 06, 2016

The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5304121a

net: ethernet: ti: cpts: move dt props parsing to cpts driver · 4a88fb95

由 Grygorii Strashko 提交于 12月 06, 2016

Move DT properties parsing into CPTS driver to simplify CPSW
code and CPTS driver porting on other SoC in the future
(like Keystone 2) - with this change it will not be required
to add the same DT parsing code in Keystone 2 NETCP driver.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a88fb95

net: ethernet: ti: cpts: rework initialization/deinitialization · 8a2c9a5a

由 Grygorii Strashko 提交于 12月 06, 2016

The current implementation CPTS initialization and deinitialization
(represented by cpts_register/unregister()) does too many static
initialization from .ndo_open(), which is reasonable to do once at probe
time instead, and also require caller to allocate memory for struct cpts,
which is internal for CPTS driver in general.

This patch splits CPTS initialization and deinitialization on two parts:

- static initializtion cpts_create()/cpts_release() which expected to be
executed when parent driver is probed/removed;

- dynamic part cpts_register/unregister() which expected to be executed
when network device is opened/closed.

As result, current code of CPTS parent driver - CPSW - will be simplified
(and it also will allow simplify adding support for Keystone 2 devices in
the future), plus more initialization errors will be catched earlier. In
addition, this change allows to clean up cpts.h for the case when CPTS is
disabled.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a2c9a5a

net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs · 2a79df3e

由 Grygorii Strashko 提交于 12月 06, 2016

CPTS module and IRQs are always enabled when CPTS is registered,
before starting overflow check work, and disabled during
deregistration, when overflow check work has been canceled already.
So, It doesn't require to (re)enable CPTS module and IRQs in
cpts_overflow_check().
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a79df3e

net: ethernet: ti: cpts: clean up event list if event pool is empty · e4439fa8

由 WingMan Kwok 提交于 12月 06, 2016

When a CPTS user does not exit gracefully by disabling cpts
timestamping and leaving a joined multicast group, the system
continues to receive and timestamps the ptp packets which eventually
occupy all the event list entries.  When this happns, the added code
tries to remove some list entries which are expired.
Signed-off-by: NWingMan Kwok <w-kwok2@ti.com>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4439fa8

net: ethernet: ti: cpts: disable cpts when unregistered · 8fcd6891

由 Grygorii Strashko 提交于 12月 06, 2016

The cpts now is left enabled after unregistration.
Hence, disable it in cpts_unregister().
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fcd6891

net: ethernet: ti: cpts: fix registration order · 6c691405

由 Grygorii Strashko 提交于 12月 06, 2016

The ptp clock registered before spinlock, which is protecting it, and
before timecounter and cyclecounter initialization in cpts_register().

So, ensure that ptp clock is registered the last, after everything
else is done.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c691405

net: ethernet: ti: cpts: fix unbalanced clk api usage in cpts_register/unregister · fd123a94

由 Grygorii Strashko 提交于 12月 06, 2016

There are two issues with TI CPTS code which are reproducible when TI
CPSW ethX device passes few up/down iterations:
- cpts refclk prepare counter continuously incremented after each
up/down iteration;
- devm_clk_get(dev, "cpts") is called many times.

Hence, fix these issues by using clk_disable_unprepare() in
cpts_clk_release() and skipping devm_clk_get() if cpts refclk has been
acquired already.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd123a94

net: ethernet: ti: cpsw: minimize direct access to struct cpts · b63ba58e

由 Grygorii Strashko 提交于 12月 06, 2016

This will provide more flexibility in changing CPTS internals and also
required for further changes.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b63ba58e

net: ethernet: ti: allow cpts to be built separately · c8395d4e

由 Grygorii Strashko 提交于 12月 06, 2016

TI CPTS IP is used as part of TI OMAP CPSW driver, but it's also
present as part of NETCP on TI Keystone 2 SoCs. So, It's required
to enable build of CPTS for both this drivers and this can be
achieved by allowing CPTS to be built separately.

Hence, allow cpts to be built separately and convert it to be
a module as both CPSW and NETCP drives can be built as modules.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8395d4e

net: ethernet: ti: cpts: switch to readl/writel_relaxed() · 391fd6ca

由 Grygorii Strashko 提交于 12月 06, 2016

Switch to readl/writel_relaxed() APIs, because this is recommended
API and the CPTS IP is reused on Keystone 2 SoCs
where LE/BE modes are supported.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

391fd6ca

07 12月, 2016 19 次提交

Merge branch 'bnxt_en-RDMA' · d3243aef

由 David S. Miller 提交于 12月 07, 2016

Michael Chan says:

====================
bnxt_en: Add interface to support RDMA driver.

This series adds an interface to support a brand new RDMA driver bnxt_re.
The first step is to re-arrange some code so that pci_enable_msix() can
be called during pci probe.  The purpose is to allow the RDMA driver to
initialize and stay initialized whether the netdev is up or down.

Then we make some changes to VF resource allocation so that there is
enough resources to support RDMA.

Finally the last patch adds a simple interface to allow the RDMA driver to
probe and register itself with any bnxt_en devices that support RDMA.
Once registered, the RDMA driver can request MSIX, send fw messages, and
receive some notifications.

v2: Fixed kbuild test robot warnings.

David, please consider this series for net-next.  Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3243aef

bnxt_en: Add interface to support RDMA driver. · a588e458

由 Michael Chan 提交于 12月 07, 2016

Since the network driver and RDMA driver operate on the same PCI function,
we need to create an interface to allow the RDMA driver to share resources
with the network driver.

1. Create a new bnxt_en_dev struct which will be returned by
bnxt_ulp_probe() upon success.  After that, all calls from the RDMA driver
to bnxt_en will pass a pointer to this struct.

2. This struct contains additional function pointers to register, request
msix, send fw messages, register for async events.

3. If the RDMA driver wants to enable RDMA on the function, it needs to
call the function pointer bnxt_register_device().  A ulp_ops structure
is passed for RCU protected upcalls from bnxt_en to the RDMA driver.

4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg()
function pointer.

5. 1 stats context is reserved when the RDMA driver registers.  MSIX
and completion rings are reserved when the RDMA driver calls
bnxt_request_msix() function pointer.

6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources
will be cleaned up.

v2: Fixed 2 uninitialized variable warnings.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a588e458

bnxt_en: Refactor the driver registration function with firmware. · a1653b13

由 Michael Chan 提交于 12月 07, 2016

The driver register function with firmware consists of passing version
information and registering for async events.  To support the RDMA driver,
the async events that we need to register may change.  Separate the
driver register function into 2 parts so that we can just update the
async events for the RDMA driver.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1653b13

bnxt_en: Reserve RDMA resources by default. · e4060d30

由 Michael Chan 提交于 12月 07, 2016

If the device supports RDMA, we'll setup network default rings so that
there are enough minimum resources for RDMA, if possible. However, the
user can still increase network rings to the max if he wants. The actual
RDMA resources won't be reserved until the RDMA driver registers.

v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4060d30

bnxt_en: Improve completion ring allocation for VFs. · 7b08f661

由 Michael Chan 提交于 12月 07, 2016

All available remaining completion rings not used by the PF should be
made available for the VFs so that there are enough rings in the VF to
support RDMA.  The earlier workaround code of capping the rings by the
statistics context is removed.

When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources()
to restore FW resources.  Later on we need to add some logic to account
for RDMA resources.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b08f661

bnxt_en: Move function reset to bnxt_init_one(). · aa8ed021

由 Michael Chan 提交于 12月 07, 2016

Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by
the RDMA driver before the network device is opened.  So we cannot do
function reset in bnxt_open() which will clear all the resources.

The proper place to do function reset now is in bnxt_init_one().
If we get AER, we'll do function reset as well.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa8ed021

bnxt_en: Enable MSIX early in bnxt_init_one(). · 7809592d

由 Michael Chan 提交于 12月 07, 2016

To better support the new RDMA driver, we need to move pci_enable_msix()
from bnxt_open() to bnxt_init_one().  This way, MSIX vectors are available
to the RDMA driver whether the network device is up or down.

Part of the existing bnxt_setup_int_mode() function is now refactored into
a new bnxt_init_int_mode().  bnxt_init_int_mode() is called during
bnxt_init_one() to enable MSIX.  The remaining logic in
bnxt_setup_int_mode() to map the IRQs to the completion rings is called
during bnxt_open().

v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7809592d

bnxt_en: Add bnxt_set_max_func_irqs(). · 33c2657e

由 Michael Chan 提交于 12月 07, 2016

By refactoring existing code into this new function.  The new function
will be used in subsequent patches.

v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33c2657e

net: sock_rps_record_flow() is for connected sockets · 5b8e2f61

由 Eric Dumazet 提交于 12月 06, 2016

Paolo noticed a cache line miss in UDP recvmsg() to access
sk_rxhash, sharing a cache line with sk_drops.

sk_drops might be heavily incremented by cpus handling a flood targeting
this socket.

We might place sk_drops on a separate cache line, but lets try
to avoid wasting 64 bytes per socket just for this, since we have
other bottlenecks to take care of.

sock_rps_record_flow() should only access sk_rxhash for connected
flows.

Testing sk_state for TCP_ESTABLISHED covers most of the cases for
connected sockets, for a zero cost, since system calls using
sock_rps_record_flow() also access sk->sk_prot which is on the
same cache line.

A follow up patch will provide a static_key (Jump Label) since most
hosts do not even use RFS.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b8e2f61

i40e: move all updates for VLAN mode into i40e_sync_vsi_filters · 489a3265

由 Jacob Keller 提交于 11月 11, 2016

In a similar fashion to how we handled exiting VLAN mode, move the logic
in i40e_vsi_add_vlan into i40e_sync_vsi_filters. Extract this logic into
its own function for ease of understanding as it will become quite
complex.

The new function, i40e_correct_mac_vlan_filters() correctly updates all
filters for when we need to enter VLAN mode, exit VLAN mode, and also
enforces the PVID when assigned.

Call i40e_correct_mac_vlan_filters from i40e_sync_vsi_filters passing it
the number of active VLAN filters, and the two temporary lists.

Remove the function for updating VLAN=0 filters from i40e_vsi_add_vlan.

The end result is that the logic for entering and exiting VLAN mode is
in one location which has the most knowledge about all filters. This
ensures that we always correctly have the non-VLAN filters assigned to
VID=0 or VID=-1 regardless of how we ended up getting to this result.

Additionally this enforces the PVID at sync time so that we know for
certain that an assigned PVID results in only filters with that PVID
will be added to the firmware.

Change-ID: I895cee81e9c92d0a16baee38bd0ca51bbb14e372
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

489a3265

i40e: use (add|rm)_vlan_all_mac helper functions when changing PVID · 9af52f60

由 Jacob Keller 提交于 11月 11, 2016

The current flow for adding or updating the PVID for a VF uses
i40e_vsi_add_vlan and i40e_vsi_kill_vlan which each take, then release
the hash lock. In addition the two functions also must take special care
that they do not perform VLAN mode changes as this will make the code in
i40e_ndo_set_vf_port_vlan behave incorrectly.

Fix these issues by using the new helper functions i40e_add_vlan_all_mac
and i40e_rm_vlan_all_mac which expect the hash lock to already be taken.
Additionally these functions do not perform any state updates in regards
to VLAN mode, so they are safe to use in the PVID update flow.

It should be noted that we don't need the VLAN mode update code here,
because there are only a few flows here.

(a) we're adding a new PVID
  In this case, if we already had VLAN filters the VSI is knocked
  offline so we don't need to worry about pre-existing VLAN filters

(b) we're replacing an existing PVID
  In this case, we can't have any VLAN filters except those with the old
  PVID which we already take care of manually.

(c) we're removing an existing PVID
  Similarly to above, we can't have any existing VLAN filters except
  those with the old PVID which we already take care of correctly.

Because of this, we do not need (or even want) the special accounting
done in i40e_vsi_add_vlan, so use of the helpers is a saner alternative.
It also opens the door for a future patch which will refactor the flow
of i40e_vsi_add_vlan now that it is not needed in this function.

Change-ID: Ia841f63da94e12b106f41cf7d28ce8ce92f2ad99
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9af52f60

i40e: factor out addition/deletion of VLAN per each MAC address · 490a4ad3

由 Jacob Keller 提交于 11月 11, 2016

A future refactor of how the PF assigns a PVID to a VF will want to be
able to add and remove a block of filters by VLAN without worrying about
accidentally triggering the accounting for I40E_VLAN_ANY. Additionally
the PVID assignment would like to be able to batch several changes under
one use of the mac_filter_hash_lock.

Factor out the addition and deletion of a VLAN on all MACs into their
own function which i40e_vsi_(add|kill)_vlan can use. These new functions
expect the caller to take the hash lock, as well as perform any
necessary accounting for updating I40E_VLAN_ANY filters if we are now
operating under VLAN mode.

Change-ID: If79e5b60b770433275350a74b3f1880333a185d5
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

490a4ad3

i40e: delete filter after adding its replacement when converting · 75697025

由 Jacob Keller 提交于 11月 11, 2016

Fix a subtle issue with the code for converting VID=-1 filters into VID=0
filters when adding a new VLAN. Previously the code deleted the VID=-1
filter, and then added a new VID=0 filter. In the rare case that the
addition fails due to -ENOMEM, we end up completely deleting the filter
which prevents recovery if memory pressure subsides. While it is not
strictly an issue because it is likely that memory issues would result
in many other problems, we shouldn't delete the filter until after the
addition succeeds.

Change-ID: Icba07ddd04ecc6a3b27c2e29f2c1c8673d266826
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

75697025

i40e: refactor i40e_update_filter_state to avoid passing aq_err · ac9e2390

由 Jacob Keller 提交于 11月 11, 2016

The current caller of i40e_update_filter_state incorrectly passes
aq_ret, an i40e_status variable, instead of the expected aq_err. This
happens to work because i40e_status is actually just a typedef integer,
and 0 is still the successful return. However i40e_update_filter_state
has special handling for ENOSPC which is currently being ignored.

Also notice that firmware does not update the per-filter response for
many types of errors, such as EINVAL. Thus, modify the filter setup so
that the firmware response memory is pre-set with I40E_AQC_MM_ERR_NO_RES.

This enables us to refactor i40e_update_filter_state, removing the need
to pass aq_err and avoiding a need for having 3 different flows for
checking the filter state.

The resulting code for i40e_update_filter_state is much simpler, only
a single loop and we always check each filter response value every time.
Since we pre-set the response value to match our expected error this
correctly works for all success and error flows.

Change-ID: Ie292c9511f34ee18c6ef40f955ad13e28b7aea7d
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ac9e2390

i40e: recalculate vsi->active_filters from hash contents · 38326218

由 Jacob Keller 提交于 11月 11, 2016

Previous code refactors have accidentally caused issues with the
counting of active_filters. Avoid similar issues in the future by simply
re-counting the active filters every time after we handle add and delete
of all the filters. Additionally this allows us to simplify the check
for when we exit promiscuous mode since we can combine the check for
failed filters at the same time.

Additionally since we recount filters at the end we need to set
vsi->promisc_threshold as well.

The resulting code takes a bit longer since we do have to loop over
filters again. However, the result is more readable and less likely to
become incorrect due to failed accounting of filters in the future.
Finally, this ensures that it is not possible for vsi->active_filters to
ever underflow since we never decrement it.

Change-ID: Ib4f3a377e60eb1fa6c91ea86cc02238c08edd102
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

38326218

i40e: defeature support for PTP L4 frame detection on XL710 · 1e28e861

由 Jacob Keller 提交于 11月 11, 2016

A product decision has been made to defeature detection of PTP frames
over L4 (UDP) on the XL710 MAC. Do not advertise support for L4
timestamping.

Change-ID: I41fbb0f84ebb27c43e23098c08156f2625c6ee06
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1e28e861

i40e: lock service task correctly · 91089033

由 Mitch Williams 提交于 11月 21, 2016

The service task lock was being set in the scheduling function, not the
actual service task. This would potentially leave the bit set for a long
time before the task actually ran. Furthermore, if the service task
takes too long, it calls the schedule function to reschedule itself -
which would fail to take the lock and do nothing.

Instead, set and clear the lock bit in the service task itself. In the
process, get rid of the i40e_service_event_complete() function, which is
really just two lines of code that can be put right in the service task
itself.

Change-ID: I83155e682b686121e2897f4429eb7d3f7c669168
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

91089033

i40e: Add functions which apply correct PHY access method for read and write operation · f62ba914

由 Michal Kosiarz 提交于 11月 21, 2016

Depending on external PHY type, register access method should be
different. Clause22 or Clause45 can be chosen for different PHYs.
Implemented functions apply correct access method for used device.

Change-ID: If39d5f0da9c0b905a8cbdc1ab89885535e7d0426
Signed-off-by: NMichal Kosiarz <michal.kosiarz@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f62ba914

i40e: Add FEC for 25g · 60f000a4

由 Carolyn Wyborny 提交于 11月 21, 2016

This patch adds adminq support for Forward Error
Correction ("FEC")for 25g products.

Change-ID: Iaff4910737c239d2c730e5c22a313ce9c37d3964
Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Signed-off-by: NJacek Naczyk <jacek.naczyk@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

60f000a4