提交 · b1b6b4da7867d220f0da5f6686b869b304c5459b · openeuler / Kernel

20 9月, 2014 3 次提交

net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da

由 Ido Shamay 提交于 9月 18, 2014

This function derives the base address of the CQE from the CQE size,
and calculates the real CQE context segment in it from the factor
(this is like before). Before this change the code used the factor to
calculate the base address of the CQE as well.

The factor indicates in which segment of the cqe stride the cqe information
is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
only 32 and 64 byte strides. However, with larger strides, the factor is zero,
and so cannot be used to calculate the base of the CQE.

The helper uses the same method of CQE buffer pulling made by all other
components that reads the CQE buffer (mlx4_ib driver and libmlx4).
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b6b4da

net/mlx4_core: Cache line EQE size support · 43c816c6

由 Ido Shamay 提交于 9月 18, 2014

Enable mlx4 interrupt handler to work with EQE stride feature,
The feature may be enabled when cache line is bigger than 64B.
The EQE size will then be the cache line size, and the context
segment resides in [0-31] offset.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43c816c6

net/mlx4_core: Enable CQE/EQE stride support · 77507aa2

由 Ido Shamay 提交于 9月 18, 2014

This feature is intended for archs having cache line larger then 64B.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line. Until the introduction of this feature, there
were two types of CQE/EQE:

1. 32B stride and context in the [0-31] segment
2. 64B stride and context in the [32-63] segment

This feature introduces two additional types:

3. 128B stride and context in the [0-31] segment (128B cache line)
4. 256B stride and context in the [0-31] segment (256B cache line)

Modify the mlx4_core driver to query the device for the CQE/EQE cache
line stride capability and to enable that capability when the host
cache line size is larger than 64 bytes (supported cache lines are
128B and 256B).

The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
context behaviour is changed to require this change in VF drivers
running on such archs.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77507aa2

06 9月, 2014 1 次提交

mlx4: only pull headers into skb head · cfecec56

由 Eric Dumazet 提交于 9月 05, 2014

Use the new fancy eth_get_headlen() to pull exactly the headers
into skb->head.

This speeds up GRE traffic (or more generally tunneled traffuc),
as GRO can aggregate up to 17 MSS per GRO packet instead of 8.

(Pulling too much data was forcing GRO to keep 2 frags per MSS)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfecec56

04 9月, 2014 1 次提交

mlx4_en: Convert the normal skb free path to dev_consume_skb_any() · b89df95d

由 Rick Jones 提交于 9月 03, 2014

It would appear the mlx4_en driver was still making a call to
dev_kfree_skb_any() where dev_consume_skb_any() would be more
appropriate.  This should make dropped packet profiling/tracking
easier/better over a NIC driven by mlx4_en.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89df95d

30 8月, 2014 2 次提交

mlx4: Set skb->csum_level for encapsulated checksum · 9ca8600e

由 Tom Herbert 提交于 8月 27, 2014

Set skb->csum_level instead of skb->encapsulation when indicating
CHECKSUM_UNNECESSARY for an encapsulated checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ca8600e

net/mlx4: Move the tunnel steering helper function to mlx4_core · b95089d0

由 Or Gerlitz 提交于 8月 27, 2014

Move the function which we use to set VXLAN DMFS (flow-steering) rules
from mlx4_en to mlx4_core. This refactoring will allow the mlx4_ib driver
to call the helper for the use case of user-space RAW Ethernet QPs, such
that they can serve VXLAN traffic too.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b95089d0

13 8月, 2014 1 次提交

PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use · 9baa3c34

由 Benoit Taine 提交于 8月 08, 2014

We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines.  This issue was reported by checkpatch.

A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@

- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;

// </smpl>

[bhelgaas: add semantic patch]
Signed-off-by: NBenoit Taine <benoit.taine@lip6.fr>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

9baa3c34

05 8月, 2014 1 次提交

mlx4_core: Add support for secure-host and SMP firewall · 114840c3

由 Jack Morgenstein 提交于 6月 01, 2014

Secure-host is the general term for the capability of a device
to protect itself and the subnet from malicious host software.

This is achieved by:
1. Not allowing un-trusted entities to access device configuration
   registers, directly (through pci_cr or pci_conf) and indirectly
   (through MADs).

2. Hiding M_Key from untrusted entities.

3. Preventing the modification of GUID0 by un-trusted entities

4. Not allowing drivers on untrusted hosts to receive nor to transmit
   packets over QP0 (SMP Firewall).

The secure-host capability depends on firmware handling all QP0
packets, and not passing these packets up to the driver. Any information
required by the driver for proper operation (e.g., SM lid) is passed
via events generated by the firmware while processing QP0 MADs.

Driver support mainly requires using the MAD_DEMUX FW command at startup,
where the feature is enabled/disabled through a procedure described in
the Mellanox HCA tools package.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>

[ Fix error path in mlx4_setup_hca to go to err_mcg_table_free. - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

114840c3

02 8月, 2014 1 次提交

mlx4_core: Add helper functions to support MR re-registration · e630664c

由 Matan Barak 提交于 7月 31, 2014

Add few helper functions to support a mechanism of getting an MPT,
modifying it and updating the HCA with the modified object.

The code takes 2 paths, one for directly changing the MPT (and
sometimes its related MTTs) and another one which queries the MPT and
updates the HCA via fw command SW2HW_MPT. The first path is used in
native mode; the second path is slower and is used only in SRIOV.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e630664c

25 7月, 2014 1 次提交

net/mlx4_en: mlx4_en_[gs]et_priv_flags() can be static · 3f6148e7

由 Fengguang Wu 提交于 7月 24, 2014

Fixes sparse warning intrduced by commit 0fef9d03 ("net/mlx4_en: Disable
blueflame using ethtool private flags")
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f6148e7

23 7月, 2014 4 次提交

net/mlx4_en: Reduce memory consumption on kdump kernel · ea1c1af1

由 Amir Vadai 提交于 7月 22, 2014

When memory is limited, reduce number of rx and tx rings.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea1c1af1

net/mlx4_core: Use low memory profile on kdump kernel · 2599d858

由 Amir Vadai 提交于 7月 22, 2014

When running in kdump kernel, reduce number of resources allocated for
the hardware. This will enable the NIC to operate in this low memory
environment at the expense of performance and some features not related
to the basic NIC functionality.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2599d858

net/mlx4_en: Disable blueflame using ethtool private flags · 0fef9d03

由 Amir Vadai 提交于 7月 22, 2014

Enable the user to turn off the hardware feature called BlueFlame.
Since it is something specific to mlx4_en hardware, we control
the feature via ethtool private flags.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fef9d03

net/mlx4_en: current_mac isn't updated in port up · b94901f3

由 Eyal Perry 提交于 7月 22, 2014

When port is down dev_addr is changed (e.g. by bonding) but current_mac
is not touched. When port is up again, hash_mac is updated to dev_addr,
but current_mac isn't. This leads to inconsistency between current_mac
and mac_hash. Because of that, mlx4_en_replace_mac() fails to find
current_mac in mac_hash.

Fix is to reset current_mac to dev_addr when port is up - as we do for
mac_hash.
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b94901f3

17 7月, 2014 6 次提交

net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's · 858e6c32

由 Amir Vadai 提交于 7月 16, 2014

Fix a regression introduced by commit 35f6f453 ("net/mlx4_en: Don't use
irq_affinity_notifier to track changes in IRQ affinity map").
When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc
was NULL.  This caused a kernel crash under heavy traffic - when having more
than rx NAPI budget completions.
Fixed to have it set for both EQ modes.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

858e6c32

net/mlx4_core: Remove MCG in case it is attached to promiscuous QPs only · 1645a540

由 Saeed Mahameed 提交于 7月 16, 2014

In B0 steering mode if promiscuous QP asks to be detached from MCG entry,
and it is the only one in this entry then the entry will never be deleted.
This is a wrong behavior since we don't want to keep those entries after
the promiscuous QP becomes non-promiscuous. Therefore remove steering
entry containing only promiscuous QP.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1645a540

net/mlx4_core: In SR-IOV mode host should add promisc QP to default entry only · 816e5984

由 Eugenia Emantayev 提交于 7月 16, 2014

In current situation host is adding the promiscuous QP to all steering
entries and the default entry as well. In this case when having PV
and SR-IOV on the same setup bridge will receive all traffic that is
targeted to the other VMs. This is bad.
Solution: In SR-IOV mode host can add promiscuous QP to default entry only.
The above problem and fix are relevant for B0 steering mode only.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

816e5984

net/mlx4_core: Make sure the max number of QPs per MCG isn't exceeded · 75908376

由 Alexander Guller 提交于 7月 16, 2014

In B0 steering mode when adding QPs to the default MCG entry need
to check that maximal number of QPs per MCG entry was not exceeded.
Signed-off-by: NAlexander Guller <alexg@mellanox.com>
Reviewed-by: NAviad Yehezkel <aviadye@mellanox.co.il>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75908376

net/mlx4_core: Make sure that negative array index isn't used · aab2bb0e

由 Dotan Barak 提交于 7月 16, 2014

To make sure that the array index isn't used in the code with
negative value, we stop using the for loop integer iterator
outside of it.
>From now on use members count to swap the last QP with removed one.
Fix also the second occurrence of this flow in mlx4_qp_detach_common().
In mlx4_qp_detach_common() use members_count instead of
loop iterator outside of the for loop.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aab2bb0e

net/mlx4_core: Fix leakage of SW multicast entries · 0d7869ac

由 Yevgeny Petrilin 提交于 7月 16, 2014

When removing multicast address in B0 steering mode there is
a bug in cases where there is a single QP registered for the address,
and this QP is also promiscuous. In such cases the entry wouldn't be
deleted from the SW structure representing all Ethernet MCG entries,
but would be removed in HW. This way when driver goes to remove it
from SW and HW structures the HW deletion fails.
Moreover the same index could later be used for registering
different address, which can be Infiniband.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d7869ac

15 7月, 2014 1 次提交

mlx4: mark napi id for gro_skb · 32b333fe

由 Jason Wang 提交于 7月 14, 2014

Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32b333fe

09 7月, 2014 7 次提交

net/mlx4_en: Ignore budget on TX napi polling · fbc6daf1

由 Amir Vadai 提交于 7月 08, 2014

It is recommended that TX work not count against the quota.
The cost of TX packet liberation is a minute percentage of what it costs to
process an RX frame. Furthermore, that SKB freeing makes memory available for
other paths in the stack.

Give the TX a larger budget and be more aggressive about cleaning up the Tx
descriptors this budget could be changed using ethtool:
$ ethtool -C eth1 tx-frames-irq <budget>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbc6daf1

net/mlx4_en: Fix mac_hash database inconsistency · 2695bab2

由 Noa Osherovich 提交于 7月 08, 2014

Using a local copy of dev_addr in mlx4_en_set_mac() to prevent dev_addr
from being modified during error flow or when dev_addr is modified in
another context (which is another problem that is being discussed over
the mailing list [1]).
Also fixing bad naming of priv->prev_mac into priv->current_mac.

[1] - http://patchwork.ozlabs.org/patch/351489/Reviewed-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2695bab2

net/mlx4_en: Do not count LLC/SNAP in MTU calculation · d5b8dff0

由 Yishai Hadas 提交于 7月 08, 2014

LLC/SNAP 8 bytes should not be added as part of header calculation.
If used, payload will be decreased accordingly. For MTU of 1500
we'll set 1522 instead of 1523.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NLiran Liss <liranl@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5b8dff0

net/mlx4_en: Do not disable vlan filter during promiscuous mode · 49a1e4f6

由 Eugenia Emantayev 提交于 7月 08, 2014

Promiscous mode is only for MACs.
Should not disable/enable VLAN filter when entering/leaving promisuous mode.
Signed-off-by: NAviad Yehezkel <aviadye@mellanox.co.il>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49a1e4f6

net/mlx4: Verify port number in __mlx4_unregister_mac · 143b3efb

由 Eugenia Emantayev 提交于 7月 08, 2014

Verify port number to avoid crashes if port number is outside the range.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

143b3efb

net/mlx4_en: Run loopback test only when port is up · 4359db1e

由 Eugenia Emantayev 提交于 7月 08, 2014

Loopback can't work when port is down.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4359db1e

net/mlx4_en: Fix set port ratelimit for 40GE · 523ece88

由 Eugenia Emantayev 提交于 7月 08, 2014

In 40GE we can't use the default bw units for set ratelimit (100 Mbps)
since the max is 255*100 Mbps = 25 Gbps (not suited for 40GE), thus we need 1 Gbps units.
But for 10GE 1 Gbps units might be too bruit so we use the following solution.

For user set ratelimit <= 25 Gbps:
        use 100 Mbps units * user_ratelimit (* 10).

For user set ratelimit > 25 Gbps:
        use 1 Gbps units * user_ratelimit.

For user set unlimited ratelimit (0 Gbps):
        use 1 Gbps units * MAX_RATELIMIT_DEFAULT (57)

Note: any value > 58 will damage the FW ratelimit computation, so we allow
      a max and any higher value will be pulled down to 57.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

523ece88

08 7月, 2014 1 次提交

net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set · e326f2f1

由 Or Gerlitz 提交于 7月 02, 2014

The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads
are supported by the device instead of checking if they are currently enabled.

This causes the driver to configure the HW parser to conduct matching for vxlan
packets but since no steering rules were set, vxlan packets are dropped on RX.

Fix that by doing the right test, as done in the del_vxlan_port ndo handler.

Fixes: 1b136de1 ('net/mlx4: Implement vxlan ndo calls')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e326f2f1

03 7月, 2014 2 次提交

net/mlx4_en: IRQ affinity hint is not cleared on port down · bb273617

由 Amir Vadai 提交于 6月 29, 2014

Need to remove affinity hint at mlx4_en_deactivate_cq() and not at
mlx4_en_destroy_cq() - since affinity_mask might be free'd while still
being used by procfs.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb273617

net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453

由 Amir Vadai 提交于 6月 29, 2014

IRQ affinity notifier can only have a single notifier - cpu_rmap
notifier. Can't use it to track changes in IRQ affinity map.
Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
during NAPI poll thread.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35f6f453

23 6月, 2014 1 次提交

net/mlx4_core: Fix the error flow when probing with invalid VF configuration · 960b1f45

由 Or Gerlitz 提交于 6月 22, 2014

Single ported VF are currently not supported on configurations where
one or both ports are IB. When we hit this case, the relevant flow in
the driver didn't return error and jumped to the wrong label. Fix that.

Fixes: dd41cc3b ('net/mlx4: Adapt num_vfs/probed_vf params for single port VF')
Reported-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

960b1f45

12 6月, 2014 1 次提交

net/mlx4_en: Use affinity hint · 9e311e77

由 Yuval Atias 提交于 6月 09, 2014

The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores.  If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.
Signed-off-by: NYuval Atias <yuvala@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e311e77

11 6月, 2014 2 次提交

net/mlx4_core: Keep only one driver entry release mlx4_priv · da1de8df

由 Wei Yang 提交于 6月 08, 2014

Following commit befdf897 "net/mlx4_core: Preserve pci_dev_data after
__mlx4_remove_one()", there are two mlx4 pci callbacks which will
attempt to release the mlx4_priv object -- .shutdown and .remove.

This leads to a use-after-free access to the already freed mlx4_priv
instance and trigger a "Kernel access of bad area" crash when both
.shutdown and .remove are called.

During reboot or kexec, .shutdown is called, with the VFs probed to
the host going through shutdown first and then the PF. Later, the PF
will trigger VFs' .remove since VFs still have driver attached.

Fix that by keeping only one driver entry which releases mlx4_priv.

Fixes: befdf897 ('net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()')
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da1de8df

net/mlx4_core: Fix SRIOV free-pool management when enforcing resource quotas · 95646373

由 Jack Morgenstein 提交于 6月 08, 2014

The Hypervisor driver tracks free slots and reserved slots at the global level
and tracks allocated slots and guaranteed slots per VF.

Guaranteed slots are treated as reserved by the driver, so the total
reserved slots is the sum of all guaranteed slots over all the VFs.

As VFs allocate resources, free (global) is decremented and allocated (per VF)
is incremented for those resources. However, reserved (global) is never changed.

This means that effectively, when a VF allocates a resource from its
guaranteed pool, it is actually reducing that resource's free pool (since
the global reserved count was not also reduced).

The fix for this problem is the following: For each resource, as long as a
VF's allocated count is <= its guaranteed number, when allocating for that
VF, the reserved count (global) should be reduced by the allocation as well.

When the global reserved count reaches zero, the remaining global free count
is still accessible as the free pool for that resource.

When the VF frees resources, the reverse happens: the global reserved count
for a resource is incremented only once the VFs allocated number falls below
its guaranteed number.

This fix was developed by Rick Kready <kready@us.ibm.com>
Reported-by: NRick Kready <kready@us.ibm.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95646373

07 6月, 2014 1 次提交
- J
  net: use SPEED_UNKNOWN and DUPLEX_UNKNOWN when appropriate · 537fae01
  由 Jiri Pirko 提交于 6月 06, 2014
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  537fae01
05 6月, 2014 1 次提交

mlx4_core: Fix GFP flags parameters to be gfp_t · 4e2c341b

由 Roland Dreier 提交于 6月 04, 2014

Otherwise sparse gives a bunch of warnings like

drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: sparse: incorrect type in argument 4 (different base types)
drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: expected int [signed] gfp
drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: got restricted gfp_t
Signed-off-by: NRoland Dreier <roland@purestorage.com>

4e2c341b

03 6月, 2014 2 次提交

ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh() · fe62d001

由 Ben Hutchings 提交于 5月 15, 2014

ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

fe62d001

IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO · 40f2287b

由 Jiri Kosina 提交于 5月 11, 2014

Modify the various routines used to allocate memory resources which
serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
driver to use GFP_KERNEL in it's QP allocations as done prior to this
commit, and the IB driver to use GFP_NOIO when the IB verbs
IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

40f2287b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功