- 01 9月, 2017 8 次提交
-
-
由 Robin Murphy 提交于
mmio_flush_range() suffers from a lack of clearly-defined semantics, and is somewhat ambiguous to port to other architectures where the scope of the writeback implied by "flush" and ordering might matter, but MMIO would tend to imply non-cacheable anyway. Per the rationale in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the only existing use is actually to invalidate clean cache lines for ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent cleanup of the pmem API, that also now happens to be the exact purpose of arch_invalidate_pmem(), which would be a far more well-defined tool for the job. Rather than risk potentially inconsistent implementations of mmio_flush_range() for the sake of one callsite, streamline things by removing it entirely and instead move the ARCH_MEMREMAP_PMEM related definitions up to the libnvdimm level, so they can be shared by NFIT as well. This allows NFIT to be enabled for arm64. Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
Clearing errors or badblocks during a BTT write requires sending an ACPI DSM, which means potentially sleeping. Since a BTT IO happens in atomic context (preemption disabled, spinlocks may be held), we cannot perform error clearing in the course of an IO. Due to this error clearing for BTT IOs has hitherto been disabled. In this patch we move error clearing out of the atomic section, and thus re-enable error clearing with BTTs. When we are about to add a block to the free list, we check if it was previously marked as an error, and if it was, we add it to the freelist, but also set a flag that says error clearing will be required. We then drop the lane (ending the atomic context), and send a zero buffer so that the error can be cleared. The error flag in the free list is protected by the nd 'lane', and is set only be a thread while it holds that lane. When the error is cleared, the flag is cleared, but while holding a mutex for that freelist index. When writing, we check for two things - 1/ If the freelist mutex is held or if the error flag is set. If so, this is an error block that is being (or about to be) cleared. 2/ If the block is a known badblock based on nsio->bb The second check is required because the BTT map error flag for a map entry only gets set when an error LBA is read. If we write to a new location that may not have the map error flag set, but still might be in the region's badblock list, we can trigger an EIO on the write, which is undesirable and completely avoidable. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths. Specifically, the DSM to clear media errors is called during writes, so that we can provide a writes-fix-errors model. However it is easy to imagine a scenario like: -> write through the nvdimm driver -> acpi allocation -> writeback, causes more IO through the nvdimm driver -> deadlock Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO flag for the current scope when issuing commands/IOs that are expected to clear errors. Cc: <linux-acpi@vger.kernel.org> Cc: <linux-nvdimm@lists.01.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
In preparation for the error clearing rework, add sector_size in the arena_info struct. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
In btt_map_read, we read the map twice to make sure that the map entry didn't change after we added it to the read tracking table. In anticipation of expanding the use of the error bit, also make sure that the error and zero flags are constant across the two map reads. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
Add helpers for converting a raw map entry to just the block number, or either of the 'e' or 'z' flags in preparation for actually using the error flag to mark blocks with media errors. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
The IO context conversion for rw_bytes missed a case in the BTT write path (btt_map_write) which should've been marked as atomic. In reality this should not cause a problem, because map writes are to small for nsio_rw_bytes to attempt error clearing, but it should be fixed for posterity. Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that things like the nfit unit tests, which don't actually sleep, can catch bugs like this. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
When the nfit driver initializes it runs an ARS (Address Range Scrub) operation across every pmem range. Part of that process involves determining the ARS capabilities of a given address range. One of the capabilities that is reported is the 'Clear Uncorrectable Error Range Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 - Query ARS Capabilities). This property is of interest to userspace software as it indicates the boundary at which the NVDIMM may need to perform read-modify-write cycles to maintain ECC blocks. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 8月, 2017 2 次提交
-
-
由 Christophe Jaillet 提交于
Check memory allocation failures and return -ENOMEM in such cases, as already done few lines below for another memory allocation. This avoids NULL pointers dereference. Cc: <stable@vger.kernel.org> Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
The old calculation assumed that the label space was 128k and the label size is 128. With v1.2 labels where the label size is 256 this calculation will return zero. We are saved by the fact that the nsindex_size is always pre-initialized from a previous 128 byte assumption and we are lucky that the index sizes turn out the same. Fix this going forward in case we start encountering different geometries of label areas besides 128k. Since the label size can change from one call to the next, drop the caching of nsindex_size. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 8月, 2017 1 次提交
-
-
由 Dan Williams 提交于
Now that we properly advertise the supported pte, pmd, and pud sizes, restrict the supported alignments that can be set on a namespace. This assumes that userspace was not previously relying on the ability to set odd alignments. At least ndctl only ever supported setting the namespace alignment to 4K, 2M, or 1G. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 12 8月, 2017 2 次提交
-
-
由 Oliver O'Halloran 提交于
The alignment of a DAX and PFN regions dictates the page sizes that can be used to map the region. Even if the hardware page sizes are known the actual range of supported page sizes that can be used with DAX depends on the kernel configuration. As a result it's best that the kernel advertises the alignments that should be used with these region types. This patch adds the 'supported_alignments' region attribute to expose this information to userspace. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> [djbw: integrate with nd_size_select_show() rename and other fixups] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Prepare for other another consumer of this size selection scheme that is not a 'sector size'. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 08 8月, 2017 1 次提交
-
-
由 Dan Williams 提交于
Use a local 'struct acpi_nfit_control_region *' variable to shorten the pointer chasing chains. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 05 8月, 2017 1 次提交
-
-
由 Dan Williams 提交于
It is useful to be able to know the position of a DIMM in an interleave-set. Consider the case where the order of the DIMMs changes causing a namespace to be invalidated because the interleave-set cookie no longer matches. If the before and after state of each DIMM position is known this state debugged by the system owner. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 26 7月, 2017 1 次提交
-
-
由 Oliver O'Halloran 提交于
Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more in common with THP than hugetlbfs we should proably be using HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just give it a new name. The other usage of HPAGE_SIZE in libnvdimm is when determining how large the altmap should be. For the reasons mentioned above it doesn't really make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to use in generic code and it happens to match the vmemmap allocation block on x86 and Power. It's still a hack, but it's a slightly nicer hack. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 23 7月, 2017 2 次提交
-
-
由 Juergen Gross 提交于
When setting up the Xenstore watch for the memory target size the new watch will fire at once. Don't try to reach the configured target size by onlining new memory in this case, as the current memory size will be smaller in almost all cases due to e.g. BIOS reserved pages. Onlining new memory will lead to more problems e.g. undesired conflicts with NVMe devices meant to be operated as block devices. Instead remember the difference between target size and current size when the watch fires for the first time and apply it to any further size changes, too. In order to avoid races between balloon.c and xen-balloon.c init calls do the xen-balloon.c initialization from balloon.c. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Wengang Wang 提交于
log a message when we enter this situation: 1) we already allocated the max number of available grants from hypervisor and 2) we still need more (but the request fails because of 1)). Sometimes the lack of grants causes IO hangs in xen_blkfront devices. Adding this log would help debuging. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 21 7月, 2017 4 次提交
-
-
由 Arnd Bergmann 提交于
gcc-7 warns about the result of a constant multiplication used as a boolean: drivers/ide/ide-timings.c: In function 'ide_timing_quantize': drivers/ide/ide-timings.c:112:24: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] q->setup = EZ(t->setup * 1000, T); This slightly rearranges the macro to simplify the code and avoid the warning at the same time. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kosuke Tatsukawa 提交于
balance-alb mode used to have transmit dynamic load balancing feature enabled by default. However, transmit dynamic load balancing no longer works in balance-alb after commit 8b426dc5 ("bonding: remove hardcoded value"). Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to send packets. This function uses the parameter tlb_dynamic_lb. tlb_dynamic_lb used to have the default value of 1 for balance-alb, but now the value is set to 0 except in balance-tlb. Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb for balance-alb similar to balance-tlb. Fixes: 8b426dc5 ("bonding: remove hardcoded value") Signed-off-by: NKosuke Tatsukawa <tatsu@ab.jp.nec.com> Acked-by: NAndy Gospodarek <andy@greyhouse.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Keerthy 提交于
Push the request_irq function to the end of probe so as to ensure all the required fields are populated in the event of an ISR getting executed right after requesting the irq. Currently while loading the crash kernel a crash was seen as soon as devm_request_threaded_irq was called. This was due to n->poll being NULL which is called as part of net_rx_action function. Suggested-by: NSekhar Nori <nsekhar@ti.com> Signed-off-by: NKeerthy <j-keerthy@ti.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
The BCM53125 entry was missing an arl_entries member which would basically prevent the ARL search from terminating properly. This switch has 4 ARL entries, so add that. Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2017 18 次提交
-
-
由 Ismail, Mustafa 提交于
Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c9("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ismail, Mustafa 提交于
The port number is only valid if IB_QP_PORT is set in the mask. So only check port number if it is valid to prevent modify_qp from failing due to an invalid port number. Fixes: 5ecce4c9("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
We might get some bogus error completions in case the target will remotely invalidate the rkey and the HCA will need to retransmit from this buffer. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Vijay Immanuel 提交于
If we modified the qp to ERROR state, and drained the recieve queue, post_recv must trigger the responder task to complete the drain work request. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>-- Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Amrani, Ram 提交于
Wrap ib_copy_to_udata with a function that ensures that the data being copied over to user space isn't longer than the allowed. Fixes: cecbcddf ("qedr: Add support for QP verbs") Fixes: a7efd777 ("qedr: Add support for PD,PKEY and CQ verbs") Fixes: ac1b36e5 ("qedr: Add support for user context verbs") Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ganesh Goudar 提交于
Only use the read sge lkey/addr and the remote rkey/addr if the length of the read is not zero. Otherwise the read response might be treated as the RTR read response and not delivered to the application. Or worse Terminator hardware will fail a 0B read if the STAG is 0 even if the read length is 0. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Håkon Bugge 提交于
CM REQs cannot be successfully retried, because a new pv_cm_id is created for each request, without checking if one already exists. By checking if an id exists before creating one, the bug is fixed. This bug can be provoked by running an RDMA CM user-land application, but inserting a five seconds delay before the rdma_accept() call on the passive side. This delay is larger than the default CMA timeout, and triggers a retry from the active side. The retried REQ will use another pv_cm_id (the cm_id on the wire). This confuses the CM protocol and two REJs are sent from the passive side. Here is an excerpt from ibdump running without the patch: 3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject 7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject and here is the same with bug fix applied: 3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello) 8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse Suggested-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com> Reported-by: NWei Lin Guay <wei.lin.guay@oracle.com> Tested-by: NWei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause overflow due to the fact that the input to the function usecs_to_jiffies is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is equal to or greater than 30. The consequence is unnecessarily excessive retry and thus degradation of the system performance. This patch fixes the problem by limiting the input to 5-bit and calling usecs_to_jiffies() before multiplying the scaling factor. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Delete unused variables to prevent sparse warnings. Fixes: db1b5ddd ("IB/core: Rename uverbs event file structure") Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema") Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
Local ack delay exposed by the driver is 0 which means infinite QP timeout. Reporting the default value to 16 (approx 260ms) Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
While invoking the req_notify_cq hook, ULPs can request whether the CQs have any CQEs pending. If CQEs are pending, drivers can indicate it by returning 1 for req_notify_cq. The stack will poll CQ again till CQ is empty. This patch peeks the CQ for any valid entries and return accordingly. Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Devesh Sharma 提交于
Fix the incorrect reporting of number of polled entries by taking into account the max CQ depth in the driver. Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Devesh Sharma 提交于
Driver shall check if the host system bios has enabled Atomic operations capability in PCI Device Control 2 register of the pci-device. Expose the ATOMIC_HCA flag only if the Atomic operations capability is set. Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Somnath Kotur 提交于
Starting FW version 20.6.47, firmware is keeping separate statistics for L2 and RDMA. However, driver needs to specify RDMA or not when allocating stat_ctx. Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Eddie Wai 提交于
There's a couple of bugs in the support of max_rd_atomic and max_dest_rd_atomic. In the modify_qp, if the requested max_rd_atomic, which is the ORRQ size, is greater than what the chip can support, then we have to cap the request to chip max as we can't have the HW overflow the ORRQ. Capping the max_rd_atomic support internally is okay to do as the remaining read/atomic WRs will still be sitting in the SQ. However, for the max_dest_rd_atomic, the driver has to error out as this dictates the IRRQ size and we can't control what the remote side sends. Signed-off-by: NEddie Wai <eddie.wai@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
- Report supported value for max_mr_size to IB stack in query_device. Also, check and log if MR size requested by application in reg_user_mr() is greater than value currently supported by driver. - Report only 4K page size support for now - Fix Max_QP value returned by ibv_devinfo -vv. In case of PF, FW reserves 129 QPs for creating QP1s of VFs and PF. So the max_qp value reported by FW for PF doesn'tt include the QP1. Fixing this issue by adding 1 with the value reported by FW. Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
This fix is added only to avoid system crash in some a specific scenario. When bnxt_re driver is loaded and if user tries to change interface mac address, delete GID fails because QP1 is still associated with existing MAC (default GID). If the above command fails GID tables are not modified in the h/w or driver, but the GID context memory is freed. Now, if the user changes the mac back to the original value, another add_gid comes to the driver where the driver reports that the GID is already present in its table and tries to access the context which was already freed. So, in this case, in order to avoid NULL pointer de-reference, this patch removes the context memory free if delete_gid fails and the same context memory is re-used in new add_gid. Memory cleanup will be taken care during driver unload, while deleting the GID table. Signed-off-by: NKalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Somnath Kotur 提交于
Posting WQE size of 2 results in a WQE_FORMAT_ERROR thrown by the HW as it requires host to supply WQE Size with room for atleast one SGE so that the resulting WQE size be atleast 3. Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-