提交 · 2094acbb714e24e464c810c2d8fa57493fcb25a6 · openanolis / cloud-kernel

30 9月, 2015 12 次提交

net/ipv4: Pass proto as u8 instead of u16 in ip_check_mc_rcu · 2094acbb

由 Alexander Duyck 提交于 9月 28, 2015

This patch updates ip_check_mc_rcu so that protocol is passed as a u8
instead of a u16.

The motivation is just to avoid any unneeded type transitions since some
systems will require an instruction to zero extend a u8 field to a u16.
Also it makes it a bit more readable as to the fact that protocol is a u8
so there are no byte ordering changes needed to pass it.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2094acbb

RESEND: [PATCH v3 net-next] sky2: use random address if EEPROM is bad · 0f50c10d

由 Liviu Dudau 提交于 9月 28, 2015

On some embedded systems the EEPROM does not contain a valid MAC address.
In that case it is better to fallback to a generated mac address and
let init scripts fix the value later.
Reported-by: NLiviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
[Changed handcoded setup to use eth_hw_addr_random() and to save new address into HW]
Signed-off-by: NLiviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f50c10d

netpoll: Drop budget parameter from NAPI polling call hierarchy · 822d54b9

由 Alexander Duyck 提交于 9月 28, 2015

For some reason we were carrying the budget value around between the
various calls to napi->poll. If for example one of the drivers called had
a bug in which it returned a non-zero value for work this could result in
the budget value becoming negative.

Rather than carry around a value of budget that is 0 or less we can instead
just loop through and pass 0 to each napi->poll call. If any driver
returns a value for work done that is non-zero then we can report that
driver and continue rather than allowing a bad actor to make the budget
value negative and pass that negative value to napi->poll.

Note, the only actual change here is that instead of letting budget become
negative we are keeping it at 0 regardless of the value returned for work
since it should not be possible for the polling routine to do any actual
work with a budget of 0. So if the polling routine returns a non-0 value
we are just reporting it and continuing with a budget of 0 rather than
letting that work value be subtracted from the budget of 0.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

822d54b9

bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906

由 Nikolay Aleksandrov 提交于 9月 25, 2015

This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
  later)

Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)

The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).

Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.

Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
  while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2594e906

Merge branch 'mvneta_percpu_irq' · 191988e0

由 David S. Miller 提交于 9月 29, 2015

Gregory CLEMENT says:

====================
net: mvneta: Switch to per-CPU irq and make rxq_def useful

As stated in the first version: "this patchset reworks the Marvell
neta driver in order to really support its per-CPU interrupts, instead
of faking them as SPI, and allow the use of any RX queue instead of
the hardcoded RX queue 0 that we have currently."

Following the review which has been done, Maxime started adding the
CPU hotplug support. I continued his work a few weeks ago and here is
the result.

Since the 1st version the main change is this CPU hotplug support, in
order to validate it I powered up and down the CPUs while performing
iperf. I ran the tests during hours: the kernel didn't crash and the
network interfaces were still usable. Of course it impacted the
performance, but continuously power down and up the CPUs is not
something we usually do.

I also reorganized the series, the 3 first patches should go through
the irq subsystem, whereas the 4 others should go to the network
subsystem.

However, there is a runtime dependency between the two parts. Patch 5
depend on the patch 3 to be able to use the percpu irq.

Thanks,

Gregory

PS: Thanks to Willy who gave me some pointers on how to deal with the
NAPI.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

191988e0

net: mvneta: Statically assign queues to CPUs · f8642885

由 Maxime Ripard 提交于 9月 25, 2015

Since the switch to per-CPU interrupts, we lost the ability to set which
CPU was going to receive our RX interrupt, which was now only the CPU on
which the mvneta_open function was run.

We can now assign our queues to their respective CPUs, and make sure only
this CPU is going to handle our traffic.

This also paves the road to be able to change that at runtime, and later on
to support RSS.

[gregory.clement@free-electrons.com]: hardened the CPU hotplug support.
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8642885

net: mvneta: Allow different queues · d8936657

由 Maxime Ripard 提交于 9月 25, 2015

The mvneta driver allows to change the default RX queue trough the rxq_def
kernel parameter.

However, the current code doesn't allow to have any value but 0. It is
actively checked for in the driver's probe because the drivers makes a
number of assumption and takes a number of shortcuts in order to just use
that RX queue.

Remove these limitations in order to be able to specify any available
queue.
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8936657

net: mvneta: Handle per-cpu interrupts · 12bb03b4

由 Maxime Ripard 提交于 9月 25, 2015

Now that our interrupt controller is allowing us to use per-CPU interrupts,
actually use it in the mvneta driver.

This involves obviously reworking the driver to have a CPU-local NAPI
structure, and report for incoming packet using that structure.
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12bb03b4

net: mvneta: Fix CPU_MAP registers initialisation · 2502d0ef

由 Maxime Ripard 提交于 9月 25, 2015

The CPU_MAP register is duplicated for each CPUs at different addresses,
each instance being at a different address.

However, the code so far was using CONFIG_NR_CPUS to initialise the CPU_MAP
registers for each registers, while the SoCs embed at most 4 CPUs.

This is especially an issue with multi_v7_defconfig, where CONFIG_NR_CPUS
is currently set to 16, resulting in writes to registers that are not
CPU_MAP.

Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.8+
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2502d0ef

irqchip: armada-370-xp: Rework per-cpu interrupts handling · 080481f9

由 Maxime Ripard 提交于 9月 25, 2015

The MPIC driver currently has a list of interrupts to handle as per-cpu.

Since the timer, fabric and neta interrupts were the only per-cpu
interrupts in the system, we can now remove the switch and just check for
the hardware irq number to determine whether a given interrupt is per-cpu
or not.
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

080481f9

irq: Export per-cpu irq allocation and de-allocation functions · aec2e2ad

由 Maxime Ripard 提交于 9月 25, 2015

Some drivers might use the per-cpu interrupts and still might be built as a
module. Export request_percpu_irq an free_percpu_irq to these user, which
also make it consistent with enable/disable_percpu_irq that were exported.
Reported-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aec2e2ad

genirq: Fix the documentation of request_percpu_irq · a1b7febd

由 Maxime Ripard 提交于 9月 25, 2015

The documentation of request_percpu_irq is confusing and suggest that the
interrupt is not enabled at all, while it is actually enabled on the local
CPU.

Clarify that.
Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1b7febd

29 9月, 2015 28 次提交

net: help compiler generate better code in eth_get_headlen · 8a4683a5

由 Jesper Dangaard Brouer 提交于 9月 28, 2015

Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC))
generated suboptimal assembler code in eth_get_headlen().

This early return coding style is usually not an issue, on super scalar CPUs,
but the compiler choose to put the return statement after this very unlikely
branch, thus creating larger jump down to the likely code path.

Performance wise, I could measure slightly less L1-icache-load-misses
and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding
use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz).
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a4683a5

tcp: Fix CWV being too strict on thin streams · d2e1339f

由 Bendik Rønning Opstad 提交于 9月 23, 2015

Application limited streams such as thin streams, that transmit small
amounts of payload in relatively few packets per RTT, can be prevented
from growing the CWND when in congestion avoidance. This leads to
increased sojourn times for data segments in streams that often transmit
time-dependent data.

Currently, a connection is considered CWND limited only after having
successfully transmitted at least one packet with new data, while at the
same time failing to transmit some unsent data from the output queue
because the CWND is full. Applications that produce small amounts of
data may be left in a state where it is never considered to be CWND
limited, because all unsent data is successfully transmitted each time
an incoming ACK opens up for more data to be transmitted in the send
window.

Fix by always testing whether the CWND is fully used after successful
packet transmissions, such that a connection is considered CWND limited
whenever the CWND has been filled. This is the correct behavior as
specified in RFC2861 (section 3.1).

Cc: Andreas Petlund <apetlund@simula.no>
Cc: Carsten Griwodz <griff@simula.no>
Cc: Jonas Markussen <jonassm@ifi.uio.no>
Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Cc: Mads Johannessen <madsjoh@ifi.uio.no>
Signed-off-by: NBendik Rønning Opstad <bro.devel+kernel@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Tested-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2e1339f

cxgb4: Add HW timesptamp support for RX · 5e2a5ebc

由 Hariprasad Shenai 提交于 9月 28, 2015

Adds support for ethtool get time stamp ioctl, which is used by
tcpdump to get the supported time stamp types

eg: tcpdump -i eth5 -J
Time stamp types for eth5 (use option -j to set):
  host (Host)
  adapter_unsynced (Adapter, not synced with system time)

Adds support for adapter unsynced mode, by adding SIOCSHWTSTAMP support
in driver.
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e2a5ebc

net: Fix Hisilicon Network Subsystem Support Compilation · e4600d69

由 huangdaode 提交于 9月 27, 2015

This patch fixes the compilation error with arm allmodconfig, this error
generated due to unavailability of readq() on 32-bit platform which was
found during net-next daily compilation. In the same time, fix all the
hns drivers compilation warnings.
Signed-off-by: Nhuangdaode <huangdaode@hisilicon.com>
Signed-off-by: Nzhaungyuzeng <Yisen.zhuang@huawei.com>
Signed-off-by: Nkenneth Lee <liguozhu@hisilicon.com>
Signed-off-by: Nyankejian <yankejian@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4600d69

net: irda: pxaficp_ir: dmaengine conversion · 1273bc57

由 Robert Jarzmik 提交于 9月 26, 2015

Convert pxaficp_ir to dmaengine. As pxa architecture is shifting from
raw DMA registers access to pxa_dma dmaengine driver, convert this
driver to dmaengine.
Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Tested-by: NPetr Cvek <petr.cvek@tul.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1273bc57

net: irda: pxaficp_ir: convert to readl and writel · 89fa5724

由 Robert Jarzmik 提交于 9月 26, 2015

Convert the pxa IRDA driver to readl and writel primitives, and remove
another set of direct registers access. This leaves only the DMA
registers access, which will be dealt with dmaengine conversion.
Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Tested-by: NPetr Cvek <petr.cvek@tul.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89fa5724

net: irda: pxaficp_ir: use sched_clock() for time management · be01891e

由 Robert Jarzmik 提交于 9月 26, 2015

Instead of using directly the OS timer through direct register access,
use the standard sched_clock(), which will end up in OSCR reading
anyway.

This is a first step for direct access register removal and machine
specific code removal from this driver.

This commit changes the behavior, as previously the minimum turnaround
time was counted in 76ns steps, while with this patch it is counted in
microsecond steps. The strictly equal formula would have been :
	    while ((sched_clock() - si->last_clk) * 76 < mtt)
Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be01891e

net: fec: Remove unneeded FEATURES_NEED_QUIESCE definition · 5b40f709

由 Fabio Estevam 提交于 9月 25, 2015

There is no need to have FEATURES_NEED_QUIESCE defined as we
can simply use NETIF_F_RXCSUM instead as done in other parts
of the driver.
Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b40f709

net: Remove redundant oif checks in rt6_device_match · 17fb0b2b

由 David Ahern 提交于 9月 25, 2015

The oif has already been checked that it is non-zero; the 2 additional
checks on oif within that if (oif) {...} block are redundant.

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17fb0b2b

lan78xx: Return 0 when lan78xx_suspend() has no error. · 49d28b56

由 Woojung.Huh@microchip.com 提交于 9月 25, 2015

lan78xx_suspend() may return non-zero from lan78xx_write_reg() in some scenario.
Fix to return 0 when lan78xx_suspend() has no error.
Signed-off-by: NWoojung Huh <woojung.huh@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49d28b56

Merge branch 'mlx5-next' · 366f02d8

由 David S. Miller 提交于 9月 28, 2015

Or Gerlitz says:

====================
Mellanox mlx5 driver update

Bunch of changes from the team, while warming engines for the
upcoming SRIOV support.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

366f02d8

net/mlx5_core: Update health syndromes · 171bb2c5

由 Eli Cohen 提交于 9月 25, 2015

Update new health monitored syndromes and their descriptions.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

171bb2c5

net/mlx5_core: Fix wrong name in struct · 78ccb258

由 Eli Cohen 提交于 9月 25, 2015

The name refers to syndrome so uset ext_synd instread of ext_sync.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78ccb258

net/mlx5_core: New init and exit flow for mlx5_core · a31208b1

由 Majd Dibbiny 提交于 9月 25, 2015

In the new flow, we separate the pci initialization and teardown from the
initialization and teardown of the other resources.

init_one calls mlx5_pci_init that handles the pci resources initialization.
It then calls mlx5_load_one to initialize the remainder of the resources.

When removing a device, remove_one is invoked. However, now remove_one
calls mlx5_unload_one to free all the resources except the pci resources.
When mlx5_unload_one returns, mlx5_pci_close is called to free the pci
resources.

The above separation will allow us to implement the pci error handlers and
suspend and resume callbacks.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a31208b1

net/mlx5_core: Fix notification of page supplement error · a8ffe63e

由 Eli Cohen 提交于 9月 25, 2015

Some errors did not result with notifying firmware that the page request
could not be fulfilled. Fix this and put the notification logic into a
separate function.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8ffe63e

net/mlx5_core: Fix async commands return code · be87544d

由 Eli Cohen 提交于 9月 25, 2015

In case of async command completion, the error code returned should take
into account the command completion status.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be87544d

net/mlx5_core: Remove redundant "err" variable usage · 6c3dbd2d

由 Achiad Shochat 提交于 9月 25, 2015

Cosmetic change.
Do not use the an err variable just to assign and return it.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c3dbd2d

net/mlx5_core: Fix struct type in the DESTROY_TIR/TIS device commands · 97909302

由 Saeed Mahameed 提交于 9月 25, 2015

Used the output mailbox format for input mailbox.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97909302

net/mlx5e: Priv state flag not rolled-back upon netdev open error · 343b29f3

由 Achiad Shochat 提交于 9月 25, 2015

The private mlx5 state flag that indicates that the netdev is
opened is set at the beginning of the netdev open flow.
In case an error occured later in the mlx5 netdev open flow, this
flag was not cleared, remaining set although the actual set is
closed.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

343b29f3

tools: bpf_jit_disasm: make get_last_jit_image return unsigned · 4de61ba2

由 Andrzej Hajda 提交于 9月 25, 2015

The function returns always non-negative values.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4de61ba2

tcp: avoid reorders for TFO passive connections · 7c85af88

由 Eric Dumazet 提交于 9月 24, 2015

We found that a TCP Fast Open passive connection was vulnerable
to reorders, as the exchange might look like

[1] C -> S S <FO ...> <request>
[2] S -> C S. ack request <options>
[3] S -> C . <answer>

packets [2] and [3] can be generated at almost the same time.

If C receives the 3rd packet before the 2nd, it will drop it as
the socket is in SYN_SENT state and expects a SYNACK.

S will have to retransmit the answer.

Current OOO avoidance in linux is defeated because SYNACK
packets are attached to the LISTEN socket, while DATA packets
are attached to the children. They might be sent by different cpus,
and different TX queues might be selected.

It turns out that for TFO, we created a child, which is a
full blown socket in TCP_SYN_RECV state, and we simply can attach
the SYNACK packet to this socket.

This means that at the time tcp_sendmsg() pushes DATA packet,
skb->ooo_okay will be set iff the SYNACK packet had been sent
and TX completed.

This removes the reorder source at the host level.

We also removed the export of tcp_try_fastopen(), as it is no
longer called from IPv6.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c85af88

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · eae93fe4

由 David S. Miller 提交于 9月 28, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-09-28

This series contains updates to i40e, i40evf and igb to resolve issues
seen and reported by Red Hat.

Kiran moves i40e_get_head() in preparation for the refactor of the Tx
timeout logic, so that it can be used in other areas of the driver.
Refactored the driver timeout logic by issuing a writeback request via
a software interrupt to the hardware the first time the driver detects
a hang.  This was due to the driver being too aggressive in resetting a
hung queue.

Shannon adds the GRE protocol to the transmit checksum encoding.

Anjali fixes an issue of forcing writeback too often, which caused us to
not benefit from NAPI.  We now disable force writeback in the clean
routine for X710 and XL710 adapters.  The X722 adapters do not enable
interrupt to force a writeback and benefit from WB_ON_ITR and so force
WB is left enabled for those adapters.  Fixed a possible deadlock issue
where sync_vsi_filters() can be called directly under RTNL or through
the timer subtask without RTNL.  So update the flow to see if we are
already under RTNL before trying to grab it.

Stefan Assmann provides a fix for igb where SR-IOV was not getting
enabled properly and we ran into a NULL pointer if the max_vfs module
parameter is specified.  This is prevented by setting the
IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs().

v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch
    to the series, as it resolves another issues seen and reported by
    Red Hat.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eae93fe4

igb: assume MSI-X interrupts during initialization · cbfe360a

由 Stefan Assmann 提交于 9月 17, 2015

In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()

This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).

[    7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[    7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
[    7.419370] PGD 0
[    7.419373] Oops: 0002 [#1] SMP
[    7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[    7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[    7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
[...]
[    7.419431] Call Trace:
[    7.419442]  [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
[    7.419447]  [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0

Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.
Signed-off-by: NStefan Assmann <sassmann@kpanic.de>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cbfe360a

i40e: Fix for recursive RTNL lock during PROMISC change · 30e2561b

由 Anjali Singhai 提交于 9月 28, 2015

The sync_vsi_filters function can be called directly under RTNL
or through the timer subtask without one. This was causing a deadlock.

If sync_vsi_filters is called from a thread which held the lock,
and in another thread the PROMISC setting got changed we would
be executing the PROMISC change in the thread which already held
the lock alongside the other filter update. The PROMISC change
requires a reset if we are on a VEB, which requires it to be called
under RTNL.

Earlier the driver would call reset for PROMISC change without
checking if we were already under RTNL and would try to grab it
causing a deadlock. This patch changes the flow to see if we are
already under RTNL before trying to grab it.
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: NKiran Patil <kiran.patil@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

30e2561b

i40e: Fix RS bit update in Tx path and disable force WB workaround · 58044743

由 Anjali Singhai 提交于 9月 25, 2015

This patch fixes the issue of forcing WB too often causing us to not
benefit from NAPI.

Without this patch we were forcing WB/arming interrupt too often taking
away the benefits of NAPI and causing a performance impact.

With this patch we disable force WB in the clean routine for X710
and XL710 adapters. X722 adapters do not enable interrupt to force
a WB and benefit from WB_ON_ITR and hence force WB is left enabled
for those adapters.
For XL710 and X710 adapters if we have less than 4 packets pending
a software Interrupt triggered from service task will force a WB.

This patch also changes the conditions for setting RS bit as described
in code comments. This optimizes when the HW does a tail bump and when
it does a WB. It also optimizes when we do a wmb.
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

58044743

i40e: add GRE tunnel type to csum encoding · c1d1791d

由 Shannon Nelson 提交于 9月 25, 2015

Make sure the Tx checksum encoder knows about GRE protocol and sets the
descriptor flag appropriately.
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c1d1791d

i40e/i40evf: refactor tx timeout logic · b03a8c1f

由 Kiran Patil 提交于 9月 24, 2015

This patch modifies the driver timeout logic by issuing a writeback
request via a software interrupt to the hardware the first time the
driver detects a hang. The driver was too aggressive in resetting a hung
queue, so back that off by removing logic to down the netdevice after
too many hangs, and move the function to the service task.

Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea
Signed-off-by: NKiran Patil <kiran.patil@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b03a8c1f

i40e: Move i40e_get_head into header file · 1e6d6f8c

由 Kiran Patil 提交于 9月 24, 2015

i40e_get_head needs to be called in multiple files in a further patch,
prepare by moving the function into a header file.
Signed-off-by: NKiran Patil <kiran.patil@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1e6d6f8c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功