提交 · 5deb67f77a266010e2c10fb124b7516d0d258ce8 · openanolis / cloud-kernel

01 9月, 2017 8 次提交

libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7

由 Robin Murphy 提交于 8月 31, 2017

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5deb67f7

libnvdimm, btt: rework error clearing · d9b83c75

由 Vishal Verma 提交于 8月 30, 2017

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9b83c75

libnvdimm: fix potential deadlock while clearing errors · 0930a750

由 Vishal Verma 提交于 8月 30, 2017

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0930a750

libnvdimm, btt: cache sector_size in arena_info · 75892004

由 Vishal Verma 提交于 8月 30, 2017

In preparation for the error clearing rework, add sector_size in the
arena_info struct.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

75892004

libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d

由 Vishal Verma 提交于 8月 30, 2017

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1398199d

libnvdimm, btt: refactor map entry operations with macros · 0595d539

由 Vishal Verma 提交于 8月 30, 2017

Add helpers for converting a raw map entry to just the block number, or
either of the 'e' or 'z' flags in preparation for actually using the
error flag to mark blocks with media errors.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0595d539

libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce

由 Vishal Verma 提交于 8月 30, 2017

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1db1f3ce

libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute · a15797f4

由 Dan Williams 提交于 8月 31, 2017

When the nfit driver initializes it runs an ARS (Address Range Scrub)
operation across every pmem range. Part of that process involves
determining the ARS capabilities of a given address range. One of the
capabilities that is reported is the 'Clear Uncorrectable Error Range
Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 -
Query ARS Capabilities). This property is of interest to userspace
software as it indicates the boundary at which the NVDIMM may need to
perform read-modify-write cycles to maintain ECC blocks.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a15797f4

30 8月, 2017 2 次提交

libnvdimm, btt: check memory allocation failure · ed36b4db

由 Christophe Jaillet 提交于 8月 27, 2017

Check memory allocation failures and return -ENOMEM in such cases, as
already done few lines below for another memory allocation.

This avoids NULL pointers dereference.

Cc: <stable@vger.kernel.org>
Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ed36b4db

libnvdimm, label: fix index block size calculation · 02881768

由 Dan Williams 提交于 8月 29, 2017

The old calculation assumed that the label space was 128k and the label
size is 128. With v1.2 labels where the label size is 256 this
calculation will return zero. We are saved by the fact that the
nsindex_size is always pre-initialized from a previous 128 byte
assumption and we are lucky that the index sizes turn out the same.

Fix this going forward in case we start encountering different
geometries of label areas besides 128k.

Since the label size can change from one call to the next, drop the
caching of nsindex_size.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

02881768

16 8月, 2017 1 次提交

libnvdimm, pfn, dax: limit namespace alignments to the supported set · f13d2b61

由 Dan Williams 提交于 8月 11, 2017

Now that we properly advertise the supported pte, pmd, and pud sizes,
restrict the supported alignments that can be set on a namespace. This
assumes that userspace was not previously relying on the ability to set
odd alignments. At least ndctl only ever supported setting the namespace
alignment to 4K, 2M, or 1G.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f13d2b61

12 8月, 2017 2 次提交

libnvdimm, pfn, dax: show supported dax/pfn region alignments in sysfs · 1fdadbeb

由 Oliver O'Halloran 提交于 6月 27, 2017

The alignment of a DAX and PFN regions dictates the page sizes that can
be used to map the region. Even if the hardware page sizes are known the
actual range of supported page sizes that can be used with DAX depends
on the kernel configuration. As a result it's best that the kernel
advertises the alignments that should be used with these region types.

This patch adds the 'supported_alignments' region attribute to expose
this information to userspace.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
[djbw: integrate with nd_size_select_show() rename and other fixups]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1fdadbeb

libnvdimm: rename nd_sector_size_{show,store} to nd_size_select_{show,store} · b2c48f9f

由 Dan Williams 提交于 8月 11, 2017

Prepare for other another consumer of this size selection scheme that is
not a 'sector size'.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b2c48f9f

08 8月, 2017 1 次提交

nfit: cleanup long de-reference chains in acpi_nfit_init_interleave_set · dcb79b15

由 Dan Williams 提交于 8月 07, 2017

Use a local 'struct acpi_nfit_control_region *' variable to shorten the
pointer chasing chains.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

dcb79b15

05 8月, 2017 1 次提交

nfit, libnvdimm, region: export 'position' in mapping info · 401c0a19

由 Dan Williams 提交于 8月 04, 2017

It is useful to be able to know the position of a DIMM in an
interleave-set. Consider the case where the order of the DIMMs changes
causing a namespace to be invalidated because the interleave-set cookie no
longer matches. If the before and after state of each DIMM position is
known this state debugged by the system owner.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

401c0a19

26 7月, 2017 1 次提交

libnvdimm: Stop using HPAGE_SIZE · 0dd69643

由 Oliver O'Halloran 提交于 6月 27, 2017

Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
in common with THP than hugetlbfs we should proably be using
HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
give it a new name.

The other usage of HPAGE_SIZE in libnvdimm is when determining how large
the altmap should be. For the reasons mentioned above it doesn't really
make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
use in generic code and it happens to match the vmemmap allocation block
on x86 and Power. It's still a hack, but it's a slightly nicer hack.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0dd69643

23 7月, 2017 2 次提交

xen/balloon: don't online new memory initially · 96edd61d

由 Juergen Gross 提交于 7月 10, 2017

When setting up the Xenstore watch for the memory target size the new
watch will fire at once. Don't try to reach the configured target size
by onlining new memory in this case, as the current memory size will
be smaller in almost all cases due to e.g. BIOS reserved pages.

Onlining new memory will lead to more problems e.g. undesired conflicts
with NVMe devices meant to be operated as block devices.

Instead remember the difference between target size and current size
when the watch fires for the first time and apply it to any further
size changes, too.

In order to avoid races between balloon.c and xen-balloon.c init calls
do the xen-balloon.c initialization from balloon.c.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

96edd61d

xen/grant-table: log the lack of grants · 29d11cfd

由 Wengang Wang 提交于 7月 18, 2017

log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).

Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

29d11cfd

21 7月, 2017 4 次提交

ide: avoid warning for timings calculation · 921edf31

由 Arnd Bergmann 提交于 7月 14, 2017

gcc-7 warns about the result of a constant multiplication used as
a boolean:

drivers/ide/ide-timings.c: In function 'ide_timing_quantize':
drivers/ide/ide-timings.c:112:24: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
  q->setup   = EZ(t->setup   * 1000,  T);

This slightly rearranges the macro to simplify the code and avoid
the warning at the same time.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

921edf31

net: bonding: Fix transmit load balancing in balance-alb mode · cbf5ecb3

由 Kosuke Tatsukawa 提交于 7月 20, 2017

balance-alb mode used to have transmit dynamic load balancing feature
enabled by default.  However, transmit dynamic load balancing no longer
works in balance-alb after commit 8b426dc5 ("bonding: remove
hardcoded value").

Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
send packets.  This function uses the parameter tlb_dynamic_lb.
tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
now the value is set to 0 except in balance-tlb.

Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
for balance-alb similar to balance-tlb.

Fixes: 8b426dc5 ("bonding: remove hardcoded value")
Signed-off-by: NKosuke Tatsukawa <tatsu@ab.jp.nec.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf5ecb3

net: ethernet: ti: cpsw: Push the request_irq function to the end of probe · 070f9c65

由 Keerthy 提交于 7月 20, 2017

Push the request_irq function to the end of probe so as
to ensure all the required fields are populated in the event
of an ISR getting executed right after requesting the irq.

Currently while loading the crash kernel a crash was seen as
soon as devm_request_threaded_irq was called. This was due to
n->poll being NULL which is called as part of net_rx_action
function.
Suggested-by: NSekhar Nori <nsekhar@ti.com>
Signed-off-by: NKeerthy <j-keerthy@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

070f9c65

net: dsa: b53: Add missing ARL entries for BCM53125 · be35e8c5

由 Florian Fainelli 提交于 7月 20, 2017

The BCM53125 entry was missing an arl_entries member which would
basically prevent the ARL search from terminating properly. This switch
has 4 ARL entries, so add that.

Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be35e8c5

20 7月, 2017 18 次提交

RDMA/core: Initialize port_num in qp_attr · a62ab66b

由 Ismail, Mustafa 提交于 7月 14, 2017

Initialize the port_num for iWARP in rdma_init_qp_attr.

Fixes: 5ecce4c9("Check port number supplied by user verbs cmds")
Cc: <stable@vger.kernel.org> # v2.6.14+
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a62ab66b

RDMA/uverbs: Fix the check for port number · 5a7a88f1

由 Ismail, Mustafa 提交于 7月 14, 2017

The port number is only valid if IB_QP_PORT is set in the mask.
So only check port number if it is valid to prevent modify_qp from
failing due to an invalid port number.

Fixes: 5ecce4c9("Check port number supplied by user verbs cmds")
Cc: <stable@vger.kernel.org> # v2.6.14+
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5a7a88f1

RDMA/iser: don't send an rkey if all data is written as immadiate-data · e6e52aec

由 Sagi Grimberg 提交于 7月 06, 2017

We might get some bogus error completions in case the target will
remotely invalidate the rkey and the HCA will need to retransmit
from this buffer.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e6e52aec

rxe: fix broken receive queue draining · 12171971

由 Vijay Immanuel 提交于 6月 27, 2017

If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: NDoug Ledford <dledford@redhat.com>

12171971

RDMA/qedr: Prevent memory overrun in verbs' user responses · c75d3ec8

由 Amrani, Ram 提交于 6月 26, 2017

Wrap ib_copy_to_udata with a function that ensures that the data
being copied over to user space isn't longer than the allowed.

Fixes: cecbcddf ("qedr: Add support for QP verbs")
Fixes: a7efd777 ("qedr: Add support for PD,PKEY and CQ verbs")
Fixes: ac1b36e5 ("qedr: Add support for user context verbs")
Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c75d3ec8

iw_cxgb4: don't use WR keys/addrs for 0 byte reads · 720336c4

由 Ganesh Goudar 提交于 6月 21, 2017

Only use the read sge lkey/addr and the remote rkey/addr if the
length of the read is not zero. Otherwise the read response might
be treated as the RTR read response and not delivered to the
application. Or worse Terminator hardware will fail a 0B read
if the STAG is 0 even if the read length is 0.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

720336c4

IB/mlx4: Fix CM REQ retries in paravirt mode · 4542e3c7

由 Håkon Bugge 提交于 6月 20, 2017

CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

By checking if an id exists before creating one, the bug is fixed.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

Here is an excerpt from ibdump running without the patch:

3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject
7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject

and here is the same with bug fix applied:

3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello)
8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse
Suggested-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reported-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4542e3c7

IB/rdmavt: Setting of QP timeout can overflow jiffies computation · a25ce427

由 Kaike Wan 提交于 6月 17, 2017

Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause
overflow due to the fact that the input to the function usecs_to_jiffies
is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is
equal to or greater than 30. The consequence is unnecessarily excessive
retry and thus degradation of the system performance.

This patch fixes the problem by limiting the input to 5-bit and calling
usecs_to_jiffies() before multiplying the scaling factor.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a25ce427

IB/core: Fix sparse warnings · 266098b8

由 Matan Barak 提交于 6月 08, 2017

Delete unused variables to prevent sparse warnings.

Fixes: db1b5ddd ("IB/core: Rename uverbs event file structure")
Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema")
Signed-off-by: NDoug Ledford <dledford@redhat.com>

266098b8

RDMA/bnxt_re: Fix the value reported for local ack delay · 601577b7

由 Selvin Xavier 提交于 6月 29, 2017

Local ack delay exposed by the driver is 0 which means infinite QP
timeout. Reporting the default value to 16 (approx 260ms)
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

601577b7

RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq · 499e4569

由 Selvin Xavier 提交于 6月 29, 2017

While invoking the req_notify_cq hook, ULPs can request
whether the CQs have any CQEs pending. If CQEs are pending,
drivers can indicate  it by returning 1 for req_notify_cq.
The stack will poll CQ again till CQ is empty.

This patch peeks the CQ for any valid entries and return accordingly.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

499e4569

RDMA/bnxt_re: Fix return value of poll routine · 10d1dedf

由 Devesh Sharma 提交于 6月 29, 2017

Fix the incorrect reporting of number of polled
entries by taking into account the max CQ depth
in the driver.
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

10d1dedf

RDMA/bnxt_re: Enable atomics only if host bios supports · 254cd259

由 Devesh Sharma 提交于 6月 29, 2017

Driver shall check if the host system bios has enabled
Atomic operations capability in PCI Device Control 2
register of the pci-device. Expose the ATOMIC_HCA
flag only if the Atomic operations capability is set.
Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

254cd259

RDMA/bnxt_re: Specify RDMA component when allocating stats context · 536f0928

由 Somnath Kotur 提交于 6月 29, 2017

Starting FW version 20.6.47, firmware is keeping separate statistics
for L2 and RDMA. However, driver needs to specify RDMA or not when
allocating stat_ctx.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

536f0928

RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP · a25d112f

由 Eddie Wai 提交于 6月 29, 2017

There's a couple of bugs in the support of max_rd_atomic and
max_dest_rd_atomic. In the modify_qp, if the requested max_rd_atomic,
which is the ORRQ size, is greater than what the chip can support,
then we have to cap the request to chip max as we can't have the HW
overflow the ORRQ. Capping the max_rd_atomic support internally is okay
to do as the remaining read/atomic WRs will still be sitting in the SQ.
However, for the max_dest_rd_atomic, the driver has to error out as
this dictates the IRRQ size and we can't control what the remote
side sends.
Signed-off-by: NEddie Wai <eddie.wai@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a25d112f

RDMA/bnxt_re: Report supported value to IB stack in query_device · 58d4a671

由 Selvin Xavier 提交于 6月 29, 2017

- Report supported value for max_mr_size to IB stack in query_device.
   Also, check and log if MR size requested by application in
   reg_user_mr() is greater than value currently supported by driver.
 - Report only 4K page size support for now
 - Fix Max_QP value returned by ibv_devinfo -vv.
   In case of PF, FW reserves 129 QPs for creating QP1s of VFs
   and PF. So the max_qp value reported by FW for PF doesn'tt include
   the QP1. Fixing this issue by adding 1 with the value reported
   by FW.
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

58d4a671

RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails · 4a62c5e9

由 Selvin Xavier 提交于 6月 29, 2017

This fix is added only to avoid system crash in some a
specific scenario. When bnxt_re driver is loaded and if
user tries to change interface mac address, delete GID
fails because QP1 is still associated with existing MAC
(default GID). If the above command fails GID tables are
not modified in the h/w or driver, but the GID context memory
is freed. Now, if the user changes the mac back to the original
value, another add_gid comes to the driver where the driver
reports that the GID is already present in its table
and tries to access the context which was already freed.

So, in this case, in order to  avoid NULL pointer de-reference,
this patch removes the context memory free  if delete_gid fails
and the same context memory is re-used in new add_gid.
Memory cleanup will be taken care during driver unload, while
deleting the GID table.
Signed-off-by: NKalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4a62c5e9

RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error · ab69d4c8

由 Somnath Kotur 提交于 6月 29, 2017

Posting WQE size of 2 results in a WQE_FORMAT_ERROR
thrown by the HW as it requires host to supply WQE Size with room
for atleast one SGE so that the resulting WQE size be atleast 3.
Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ab69d4c8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功