提交 · 1a2881728211f0915c0fa1364770b9c73a67a073 · openeuler / raspberrypi-kernel

11 11月, 2014 1 次提交

mlx4: use napi_complete_done() · 1a288172

由 Eric Dumazet 提交于 11月 06, 2014

To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a288172

04 11月, 2014 2 次提交

net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages · 1ab25f86

由 Ido Shamay 提交于 11月 02, 2014

Needed in order to get cache cold pages (L3 flushed) for HW scatter.

Otherwise memory may flush those entries when the packet comes from
PCI, causing back pressure resulting in BW decrease.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ab25f86

net/mlx4_en: Remove RX buffers alignment to IP_ALIGN · 5f6e9800

由 Ido Shamay 提交于 11月 02, 2014

When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f6e9800

31 10月, 2014 1 次提交

mlx4: use napi_schedule_irqoff() · 477b35b4

由 Eric Dumazet 提交于 10月 29, 2014

mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.

They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

477b35b4

29 10月, 2014 1 次提交

net/mlx4_en: Cleanups suggested by clang static checker · c2a3d4b4

由 Jack Morgenstein 提交于 10月 27, 2014

clang flagged the following. All are actually cosmetic cleanups, not really bugs:

drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;
                ^     ~~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;

drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
        entry->reg_id = reg_id;
                      ^ ~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
        mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
(NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
 This is not a bug. Cleanup here is therefore cosmetic only).

drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
                frag_info = &priv->frag_info[i];
                ^           ~~~~~~~~~~~~~~~~~~~
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2a3d4b4

11 10月, 2014 1 次提交

mlx4: fix race accessing page->_count · 98226208

由 Eric Dumazet 提交于 10月 10, 2014

This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98226208

20 9月, 2014 1 次提交

net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da

由 Ido Shamay 提交于 9月 18, 2014

This function derives the base address of the CQE from the CQE size,
and calculates the real CQE context segment in it from the factor
(this is like before). Before this change the code used the factor to
calculate the base address of the CQE as well.

The factor indicates in which segment of the cqe stride the cqe information
is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
only 32 and 64 byte strides. However, with larger strides, the factor is zero,
and so cannot be used to calculate the base of the CQE.

The helper uses the same method of CQE buffer pulling made by all other
components that reads the CQE buffer (mlx4_ib driver and libmlx4).
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b6b4da

06 9月, 2014 1 次提交

mlx4: only pull headers into skb head · cfecec56

由 Eric Dumazet 提交于 9月 05, 2014

Use the new fancy eth_get_headlen() to pull exactly the headers
into skb->head.

This speeds up GRE traffic (or more generally tunneled traffuc),
as GRO can aggregate up to 17 MSS per GRO packet instead of 8.

(Pulling too much data was forcing GRO to keep 2 frags per MSS)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfecec56

30 8月, 2014 1 次提交

mlx4: Set skb->csum_level for encapsulated checksum · 9ca8600e

由 Tom Herbert 提交于 8月 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ca8600e

23 7月, 2014 1 次提交

net/mlx4_en: Reduce memory consumption on kdump kernel · ea1c1af1

由 Amir Vadai 提交于 7月 22, 2014

When memory is limited, reduce number of rx and tx rings.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea1c1af1

15 7月, 2014 1 次提交

mlx4: mark napi id for gro_skb · 32b333fe

由 Jason Wang 提交于 7月 14, 2014

Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32b333fe

09 7月, 2014 1 次提交

net/mlx4_en: Do not count LLC/SNAP in MTU calculation · d5b8dff0

由 Yishai Hadas 提交于 7月 08, 2014

LLC/SNAP 8 bytes should not be added as part of header calculation.
If used, payload will be decreased accordingly. For MTU of 1500
we'll set 1522 instead of 1523.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NLiran Liss <liranl@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5b8dff0

03 7月, 2014 1 次提交

net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453

由 Amir Vadai 提交于 6月 29, 2014

IRQ affinity notifier can only have a single notifier - cpu_rmap
notifier. Can't use it to track changes in IRQ affinity map.
Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
during NAPI poll thread.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35f6f453

03 6月, 2014 1 次提交

IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO · 40f2287b

由 Jiri Kosina 提交于 5月 11, 2014

Modify the various routines used to allocate memory resources which
serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
driver to use GFP_KERNEL in it's QP allocations as done prior to this
commit, and the IB driver to use GFP_NOIO when the IB verbs
IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

40f2287b

15 5月, 2014 1 次提交

net/mlx4_core: Enforce irq affinity changes immediatly · 2eacc23c

由 Yuval Atias 提交于 5月 14, 2014

During heavy traffic, napi is constatntly polling the complition queue
and no interrupt is fired. Because of that, changes to irq affinity are
ignored until traffic is stopped and resumed.

By registering to the irq notifier mechanism, and forcing interrupt when
affinity is changed, irq affinity changes will be immediatly enforced.
Signed-off-by: NYuval Atias <yuvala@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2eacc23c

09 5月, 2014 1 次提交

mellanox: Logging message cleanups · 1a91de28

由 Joe Perches 提交于 5月 07, 2014

Use a more current logging style.

o Coalesce formats
o Add missing spaces for coalesced formats
o Align arguments for modified formats
o Add missing newlines for some logging messages
o Use DRV_NAME as part of format instead of %s, DRV_NAME to
  reduce overall text.
o Use ..., ##__VA_ARGS__ instead of args... in macros
o Correct a few format typos
o Use a single line message where appropriate
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a91de28

15 3月, 2014 1 次提交

mlx4: Don't receive packets when the napi budget == 0 · 38be0a34

由 Eric W. Biederman 提交于 3月 14, 2014

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38be0a34

25 2月, 2014 2 次提交

net/mlx4: Fix limiting number of IRQ's instead of RSS queues · bb2146bc

由 Ido Shamay 提交于 2月 21, 2014

This fix a performance bug introduced by commit 90b1ebe7 "mlx4: set
maximal number of default RSS queues", which limits the numbers of IRQs
opened by core module.
The limit should be on the number of queues in the indirection table -
rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core
initialization instead of in mlx4_en, prevented using "ethtool -L" to
utilize all the CPU's, when performance mode is prefered, since limiting
this number to 8 reduces overall packet rate by 15%-50% in multiple TCP
streams applications.

For example, after running ethtool -L <ethx> rx 16

          Packet rate
Before the fix  897799
After the fix   1142070

Results were obtained using netperf:

S=200 ; ( for i in $(seq 1 $S) ; do ( \
  netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
  wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )

CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb2146bc

net/mlx4: Set number of RX rings in a utility function · 02512482

由 Ido Shamay 提交于 2月 21, 2014

mlx4_en_add() is too long.
Moving set number of RX rings to a utiltity function to improve
readability and modulization of the code.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02512482

14 1月, 2014 1 次提交

net/mlx4_en: call gro handler for encapsulated frames · e6a76758

由 Eric Dumazet 提交于 1月 09, 2014

In order to use the native GRO handling of encapsulated protocols on
mlx4, we need to call napi_gro_receive() instead of netif_receive_skb()
unless busy polling is in action.

While we are at it, rename mlx4_en_cq_ll_polling() to
mlx4_en_cq_busy_polling()

Tested with GRE tunnel : GRO aggregation is now performed on the
ethernet device instead of being done later on gre device.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6a76758

01 1月, 2014 1 次提交

net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling · 837052d0

由 Or Gerlitz 提交于 12月 23, 2013

When the device tunneling offloads mode is vxlan do the following

 - call SET_PORT with the relevant setting

 - add DMFS steering vxlan rule for the device self and multicast mac addresses
   of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP

 - set relevant QPC fields in RSS context and RX ring QPs

 - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs
   which are marked for encapsulation such that the HW will segment them properly.

 - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE

 - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

837052d0

19 12月, 2013 1 次提交

net: mlx4 calls skb_set_hash · 69174416

由 Tom Herbert 提交于 12月 17, 2013

Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69174416

08 11月, 2013 2 次提交

net/mlx4_en: Datapath structures are allocated per NUMA node · 163561a4

由 Eugenia Emantayev 提交于 11月 07, 2013

For each RX/TX ring and its CQ, allocation is done on a NUMA node that
corresponds to the core that the data structure should operate on.
The assumption is that the core number is reflected by the ring index.
The affected allocations are the ring/CQ data structures,
the TX/RX info and the shared HW/SW buffer.
For TX rings, each core has rings of all UPs.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

163561a4

net/mlx4_en: Datapath resources allocated dynamically · 41d942d5

由 Eugenia Emantayev 提交于 11月 07, 2013

Currently all TX/RX rings and completion queues are part of the
netdev priv structure and are allocated statically. This patch
will change the priv to hold only arrays of pointers and therefore
all TX/RX rings and completetion queues will be allocated
dynamically. This is in preparation for NUMA aware allocations.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41d942d5

09 10月, 2013 2 次提交

net/mlx4_en: Fix pages never dma unmapped on rx · 021f1107

由 Amir Vadai 提交于 10月 07, 2013

This patch fixes a bug introduced by commit 51151a16 (mlx4: allow
order-0 memory allocations in RX path).

dma_unmap_page never reached because condition to detect last fragment
in page is wrong. offset+frag_stride can't be greater than size, need to
make sure no additional frag will fit in page => compare offset +
frag_stride + next_frag_size instead.
next_frag_size is the same as the current one, since page is shared only
with frags of the same size.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

021f1107

net/mlx4_en: Rename name of mlx4_en_rx_alloc members · 70fbe079

由 Amir Vadai 提交于 10月 07, 2013

Add page prefix to page related members: @size and @offset into
@page_size and @page_offset

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70fbe079

11 7月, 2013 2 次提交

net: rename ll methods to busy-poll · 8b80cda5

由 Eliezer Tamir 提交于 7月 10, 2013

Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines  in include/net/busy_poll.h
Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b80cda5

net: rename include/net/ll_poll.h to include/net/busy_poll.h · 076bb0c8

由 Eliezer Tamir 提交于 7月 10, 2013

Rename the file and correct all the places where it is included.
Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

076bb0c8

26 6月, 2013 1 次提交

mlx4: allow order-0 memory allocations in RX path · 51151a16

由 Eric Dumazet 提交于 6月 23, 2013

Signed-off-by: NEric Dumazet <edumazet@google.com>

mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.

We therefore drop frames more than needed.

This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.

By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.

Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51151a16

20 6月, 2013 1 次提交

net/mlx4_en: Add Low Latency Socket (LLS) support · 9e77a2b8

由 Amir Vadai 提交于 6月 18, 2013

Add basic support for LLS.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e77a2b8

25 4月, 2013 1 次提交

net/mlx4_en: Add HW timestamping (TS) support · ec693d47

由 Amir Vadai 提交于 4月 23, 2013

The patch allows to enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.
To enable/disable HW timestamping appropriate ioctl should be used.
Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
supported.
When enabling TS on receive flow - VLAN stripping will be disabled.
Also were made all relevant changes in RX/TX flows to consider TS request
and plant HW timestamps into relevant structures.
mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec693d47

20 4月, 2013 1 次提交

net: vlan: add protocol argument to packet tagging functions · 86a9bad3

由 Patrick McHardy 提交于 4月 19, 2013

Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86a9bad3

28 2月, 2013 1 次提交

hlist: drop the node parameter from iterators · b67bfe0d

由 Sasha Levin 提交于 2月 27, 2013

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b67bfe0d

09 2月, 2013 1 次提交

drivers: net: Remove remaining alloc/OOM messages · 14f8dc49

由 Joe Perches 提交于 2月 07, 2013

alloc failures already get standardized OOM
messages and a dump_stack.

For the affected mallocs around these OOM messages:

Converted kmallocs with multiplies to kmalloc_array.
Converted a kmalloc/memcpy to kmemdup.
Removed now unused stack variables.
Removed unnecessary parentheses.
Neatened alignment.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NArend van Spriel <arend@broadcom.com>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Acked-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14f8dc49

08 2月, 2013 3 次提交

net/mlx4_en: Manage hash of MAC addresses per port · c07cb4b0

由 Yan Burman 提交于 2月 07, 2013

As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table.
Remove the radix tree for MAC addresses per QP, as it's not in use.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c07cb4b0

net/mlx4_en: Optimize Rx fast path filter checks · 6bbb6d99

由 Yan Burman 提交于 2月 07, 2013

Currently, RX path code that does RX filtering is not optimized
and does an expensive conversion. In order to use ether_addr_equal_64bits
which is optimized for such cases, we need the MAC address kept by the device
to be in the form of unsigned char array instead of u64. Store the MAC address
as unsigned char array and convert to/from u64 out of the fast path when needed.
Side effect of this is that we no longer need priv->mac, since it's the same
as dev->dev_addr.

This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bbb6d99

net/mlx4_en: Optimize loopback related checks in data path · 79aeaccd

由 Yan Burman 提交于 2月 07, 2013

Currently there are relatively complex conditional checks in the fast path,
for TX loopback enabling and resulting RX filter logic.
Move elaborate if's out of data path, replace them with a single flag
for each state and update that state from appropriate places.
Also, in native (non SRIOV) mode and not in loopback or in selftest,
there is no need to try and filter out packets that HW loopback-ed,
as in native mode we do not loopback packets anymore.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aeaccd

27 11月, 2012 1 次提交

mlx4: 64-byte CQE/EQE support · 08ff3235

由 Or Gerlitz 提交于 10月 21, 2012

ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

08ff3235

20 11月, 2012 1 次提交

mlx4_en: Remove remnants of LRO support · f1d29a3f

由 Ben Hutchings 提交于 11月 16, 2012

Commit fa37a958 ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1d29a3f

04 8月, 2012 1 次提交

net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC · c8c40b7f

由 Amir Vadai 提交于 8月 03, 2012

Should NOT check SMAC=DMAC when:
1. loopback is turned on
2. validate_loopback is true.

Fixed it accordingly.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c40b7f