提交 · 097b285d32c7cb22dd4af2286ba61668a6c367ef · openanolis / cloud-kernel

13 12月, 2015 1 次提交

drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 26bbe7ef

由 Seth Jennings 提交于 12月 11, 2015

Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86.  This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.

Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present.  If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.

This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: NSeth Jennings <sjennings@variantweb.net>
Reported-by: NAndrew Banman <abanman@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26bbe7ef

12 12月, 2015 1 次提交

parisc iommu: fix panic due to trying to allocate too large region · e46e31a3

由 Mikulas Patocka 提交于 11月 30, 2015

When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.

Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources

CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
 [<000000004021497c>] show_stack+0x14/0x20
 [<0000000040410bf0>] dump_stack+0x88/0x100
 [<000000004023978c>] panic+0x124/0x360
 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
 [<0000000040453150>] sba_map_sg+0x260/0x5b8
 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
 [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
 [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
 [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
 [<000000004049da34>] scsi_request_fn+0x6e4/0x970
 [<00000000403e95a8>] __blk_run_queue+0x40/0x60
 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
 [<000000004049a534>] scsi_run_queue+0x2a4/0x360
 [<000000004049be68>] scsi_end_request+0x1a8/0x238
 [<000000004049de84>] scsi_io_completion+0xfc/0x688
 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0

The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).

The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.

How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:

	if (startsg->length + dma_len > max_seg_size)
		break;

When it finishes coalescing adjacent entries, it allocates the mapping:

sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
	PIDE_FLAG
	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
	| dma_offset;

It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.

To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
	if (startsg->length + dma_len > max_seg_size)
		break;
to
	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
		break;

This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NHelge Deller <deller@gmx.de>

e46e31a3

11 12月, 2015 1 次提交

vgaarb: fix signal handling in vga_get() · 9f5bd308

由 Kirill A. Shutemov 提交于 11月 30, 2015

There are few defects in vga_get() related to signal hadning:

  - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
    case;

  - if we found pending signal we must remove ourself from wait queue
    and change task state back to running;

  - -ERESTARTSYS is more appropriate, I guess.
Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

9f5bd308

10 12月, 2015 4 次提交

dm btree: fix bufio buffer leaks in dm_btree_del() error path · ed8b45a3

由 Joe Thornber 提交于 12月 10, 2015

If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

ed8b45a3

ipmi: move timer init to before irq is setup · 27f972d3

由 Jan Stancek 提交于 12月 08, 2015

We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.

static int smi_start_processing(void       *send_info,
                                ipmi_smi_t intf)
{
        /* Try to claim any interrupts. */
        if (new_smi->irq_setup)
                new_smi->irq_setup(new_smi);

 --> IRQ arrives here and irq handler tries to modify uninitialized timer

    which triggers BUG_ON(!timer->function) in __mod_timer().

 Call Trace:
   <IRQ>
   [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
   [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
   [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
   [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
   [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
   [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
   [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
   [<ffffffff8100fc59>] handle_irq+0x49/0xa0
   [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
   [<ffffffff8100ba53>] ret_from_intr+0x0/0x11

        /* Set up the timer that drives the interface. */
        setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);

The following patch fixes the problem.

To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: NJan Stancek <jstancek@redhat.com>
Signed-off-by: NTony Camuso <tcamuso@redhat.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # Applies cleanly to 3.10-, needs small rework before

27f972d3

dm space map metadata: fix ref counting bug when bootstrapping a new space map · 50dd842a

由 Joe Thornber 提交于 12月 09, 2015

When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

50dd842a

dm thin metadata: fix bug when taking a metadata snapshot · 49e99fc7

由 Joe Thornber 提交于 12月 09, 2015

When you take a metadata snapshot the btree roots for the mapping and
details tree need to have their reference counts incremented so they
persist for the lifetime of the metadata snap.

The roots being incremented were those currently written in the
superblock, which could possibly be out of date if concurrent IO is
triggering new mappings, breaking of sharing, etc.

Fix this by performing a commit with the metadata lock held while taking
a metadata snapshot.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

49e99fc7

09 12月, 2015 14 次提交

of/irq: Export of_irq_find_parent again · 4c3141e0

由 Carlo Caione 提交于 12月 01, 2015

of_irq_find_parent was made static since it had no users outside of
of_irq.c. Export it again since we are going to use it again.
Signed-off-by: NCarlo Caione <carlo@endlessm.com>
[robh: move of_irq_find_parent to correct ifdef section]
Signed-off-by: NRob Herring <robh@kernel.org>

4c3141e0

radeon: Fix VCE IB test on Big-Endian systems · 361c32d3

由 Oded Gabbay 提交于 12月 04, 2015

This patch makes the VCE IB test pass on Big-Endian systems. It converts
to little-endian the contents of the VCE message.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

361c32d3

radeon: Fix VCE ring test for Big-Endian systems · 687f4b98

由 Oded Gabbay 提交于 12月 04, 2015

This patch fixes the VCE ring test when running on Big-Endian machines.
Every write to the ring needs to be translated to little-endian.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

687f4b98

radeon/cik: Fix GFX IB test on Big-Endian · 5f3e226f

由 Oded Gabbay 提交于 12月 04, 2015

This patch makes the IB test on the GFX ring pass for CI-based cards
installed in Big-Endian machines.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5f3e226f

drm/amdgpu: fix the lost duplicates checking · e410b5cb

由 Chunming Zhou 提交于 12月 07, 2015

Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJammy Zhou <Jammy.Zhou@amd.com>
Cc: stable@vger.kernel.org

e410b5cb

drm/nouveau/pmu: remove whitelist for PGOB-exit WAR, enable by default · 714a98fc

由 Ben Skeggs 提交于 12月 09, 2015

NVIDIA have indicated that the workaround is required on all GK10[467]
boards that have the PGOB fuse set.

I've left the commandline option in place for now, as paranoia.
Signed-off-by: NBen Skeggs <bskeggs@redhat.com>

714a98fc

IB/mlx5: Postpone remove_keys under knowledge of coming preemption · ab5cdc31

由 Leon Romanovsky 提交于 10月 21, 2015

The remove_keys() logic is performed as garbage collection task. Such
task is intended to be run when no other active processes are running.

The need_resched() will return TRUE if there are user tasks to be
activated in near future.

In such case, we don't execute remove_keys() and postpone
the garbage collection work to try to run in next cycle,
in order to free CPU resources to other tasks.

The possible pseudo-code to trigger such scenario:
1. Allocate a lot of MR to fill the cache above the limit.
2. Wait a small amount of time "to calm" the system.
3. Start CPU extensive operations on multi-node cluster.
4. Expect performance degradation during MR cache shrink operation.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ab5cdc31

IB/mlx4: Use vmalloc for WR buffers when needed · 0ef2f05c

由 Wengang Wang 提交于 10月 08, 2015

There are several hits that WR buffer allocation(kmalloc) failed.
It failed at order 3 and/or 4 contigous pages allocation. At the same time
there are actually 100MB+ free memory but well fragmented.
So try vmalloc when kmalloc failed.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0ef2f05c

IB/mlx4: Use correct order of variables in log message · 73d4da7b

由 Wengang Wang 提交于 10月 08, 2015

There is a mis-order in mlx4 log. Fix it.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

73d4da7b

null_blk: Fix error path in module initialization · af096e22

由 Minfei Huang 提交于 12月 08, 2015

Module couldn't release resource properly during the initialization. To
fix this issue, we will clean up the proper resource before returning.
Signed-off-by: NMinfei Huang <mnfhuang@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

af096e22

iser-target: Remove explicit mlx4 work-around · f1a47d37

由 Sagi Grimberg 提交于 10月 28, 2015

The driver now exposes sufficient limits so we can
avoid having mlx4 specific work-around.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f1a47d37

mlx4: Expose correct max_sge_rd limit · a5e14ba3

由 Sagi Grimberg 提交于 10月 28, 2015

mlx4 devices (ConnectX-2, ConnectX-3) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)

So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a5e14ba3

IB/mad: Require CM send method for everything except ClassPortInfo · 53370886

由 Hal Rosenstock 提交于 11月 13, 2015

Receipt of CM MAD with other than the Send method for an attribute
other than the ClassPortInfo attribute is invalid.

CM attributes other than ClassPortInfo only use the send method.

The SRP initiator does not maintain a timeout policy for CM connect
requests relies on the CM layer to do that. The result was that
the SRP initiator hung as the connect request never completed.

A new SRP target has been observed to respond to Send CM REQ
with GetResp of CM REQ with bad status. This is non conformant
with IBA spec but exposes a vulnerability in the current MAD/CM
code which will respond to the incoming GetResp of CM REQ as if
it was a valid incoming Send of CM REQ rather than tossing
this on the floor. It also causes the MAD layer not to
retransmit the original REQ even though it has not received a REP.
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NHal Rosenstock <hal@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

53370886

IB/cma: Add a missing rcu_read_unlock() · d3632493

由 Bart Van Assche 提交于 11月 20, 2015

Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if
fib_lookup() fails. Detected by sparse. Compile-tested only.

Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac).
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d3632493

08 12月, 2015 19 次提交

of/fdt: Add mutex protection for calls to __unflatten_device_tree() · f8062386

由 Guenter Roeck 提交于 12月 05, 2015

__unflatten_device_tree() calls unflatten_dt_node(), which declares
a static variable. It is therefore not reentrant.

One of the callers of __unflatten_device_tree(), unflatten_device_tree(),
is only called once during early initialization and does not need to be
protected. The other caller, of_fdt_unflatten_tree(), can be called at
any time, possibly multiple times in parallel. This can happen, for
example, if multiple devicetree overlays have to be loaded and installed.

Without this protection, errors such as the following may be seen.

kernel: End of tree marker overwritten: e6a3a458
kernel: find_target_node:
	Failed to find target-indirect node at /fragment@0
kernel: __of_overlay_create: of_build_overlay_info() failed for tree@/

Add a mutex to of_fdt_unflatten_tree() to make the call reentrant.

Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: NRob Herring <robh@kernel.org>

f8062386

drm/vmwgfx: Implement the cursor_set2 callback v2 · 8fbf9d92

由 Thomas Hellstrom 提交于 11月 26, 2015

Fixes native drm clients like Fedora 23 Wayland which now appears to
be able to use cursor hotspots without strange cursor offsets.
Also fixes a couple of ignored error paths.

Since the core drm cursor hotspot is incompatible with the legacy vmwgfx
hotspot (the core drm hotspot is reset when the drm_mode_cursor ioctl
is used), we need to keep track of both and add them when the device
hotspot is set. We assume that either is always zero.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: NSinclair Yeh <syeh@vmware.com>

8fbf9d92

cxl: Set endianess of kernel contexts · e606e035

由 Frederic Barrat 提交于 12月 07, 2015

A process element (defined in CAIA) keeps track of the endianess of
contexts through the Little Endian (LE) bit of the State Register. It
is currently set for user contexts, but was somehow forgotten for
kernel contexts, so this patch fixes it.
It could lead to erratic behavior from an AFU when the context is
attached through the kernel API.

Fixes: 2f663527 ("cxl: Configure PSL for kernel contexts and merge code")
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Suggested-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e606e035

IB core: Fix ib_sg_to_pages() · 8f5ba10e

由 Bart Van Assche 提交于 12月 03, 2015

On 12/03/2015 01:18 AM, Christoph Hellwig wrote:
> The patch looks good to me, but while we touch this area, how about
> throwing in a few cosmetic fixes as well?

How about the patch below ? In that version of the ib_sg_to_pages() fix
these concerns have been addressed and additionally to more bugs have been fixed.

------------

[PATCH] IB core: Fix ib_sg_to_pages()

Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary.

In the code for coalescing contiguous elements, ensure that
mr->length is correct and that last_page_addr is up-to-date.

Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.

Fixes: commit 4c67e2bf ("IB/core: Introduce new fast registration API")
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8f5ba10e

IB/srp: Fix srp_map_sg_fr() · 57b0be9c

由 Bart Van Assche 提交于 12月 01, 2015

After dma_map_sg() has been called the return value of that function
must be used as the number of elements in the scatterlist instead of
scsi_sg_count().

Fixes: commit f7f7aab1 ("IB/srp: Convert to new registration API")
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.4+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

57b0be9c

IB/srp: Fix indirect data buffer rkey endianness · a745f4f4

由 Bart Van Assche 提交于 12月 01, 2015

Detected by sparse.

Fixes: commit 330179f2 ("IB/srp: Register the indirect data buffer descriptor")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.3+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a745f4f4

IB/srp: Initialize dma_length in srp_map_idb · fc925518

由 Christoph Hellwig 提交于 12月 01, 2015

Without this sg_dma_len will return 0 on architectures tha have
the dma_length field.

Fixes: commit f7f7aab1 ("IB/srp: Convert to new registration API")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fc925518

IB/srp: Fix possible send queue overflow · 09c0c0be

由 Sagi Grimberg 提交于 12月 01, 2015

When using work request based memory registration (fast_reg)
we must reserve SQ entries for registration and invalidation
in addition to send operations. Each IO consumes 3 SQ entries
(registration, send, invalidation) so we need to allocate 3x
larger send-queue instead of 2x.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

09c0c0be

IB/srp: Fix a memory leak · 4d59ad29

由 Bart Van Assche 提交于 12月 01, 2015

If srp_connect_ch() returns a positive value then that is considered
by its caller as a connection failure but this does not result in a
scsi_host_put() call and additionally causes the srp_create_target()
function to return a positive value while it should return a negative
value. Avoid all this confusion and additionally fix a memory leak by
ensuring that srp_connect_ch() always returns a value that is <= 0.
This patch avoids that a rejected login triggers the following memory
leak:

unreferenced object 0xffff88021b24a220 (size 8):
  comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
  hex dump (first 8 bytes):
    68 6f 73 74 35 38 00 a5                          host58..
  backtrace:
    [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
    [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
    [<ffffffff81260d2b>] kvasprintf+0x5b/0x90
    [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
    [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
    [<ffffffff81337e3c>] dev_set_name+0x3c/0x40
    [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
    [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
    [<ffffffff8133778b>] dev_attr_store+0x1b/0x20
    [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
    [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
    [<ffffffff81176eef>] __vfs_write+0x2f/0xf0
    [<ffffffff811771e4>] vfs_write+0xa4/0x100
    [<ffffffff81177c64>] SyS_write+0x54/0xc0
    [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4d59ad29

IB/sa: Put netlink request into the request list before sending · 3ebd2fd0

由 Kaike Wan 提交于 10月 30, 2015

It was found by Saurabh Sengar that the netlink code tried to allocate
memory with GFP_KERNEL while holding a spinlock. While it is possible
to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
to get rid of the spinlock while sending the packet. However, in order
to protect against a race condition that a quick response may be received
before the request is put on the request list, we need to put the request
on the list first.
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reported-by: NSaurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3ebd2fd0

IB/iser: use sector_div instead of do_div · 2c63d107

由 Arnd Bergmann 提交于 11月 20, 2015

do_div is the wrong way to divide a sector_t, as it is less
efficient when sector_t is 32-bit wide. With the upcoming
do_div optimizations, the kernel starts warning about this:

drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div'
include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type

This changes the code to use sector_div instead, which always
produces optimal code.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2c63d107

IB/core: use RCU for uverbs id lookup · d144da8c

由 Mike Marciniszyn 提交于 11月 02, 2015

The current implementation gets a spin_lock, and at any scale with
qib and hfi1 post send, the lock contention grows exponentially
with the number of QPs.

idr_find() is RCU compatibile, so read doesn't need the lock.

Change to use rcu_read_lock() and rcu_read_unlock() in
__idr_get_uobj().

kfree_rcu() is used to insure a grace period between the
idr removal and actual free.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d144da8c

IB/qib: Minor fixes to qib per SFF 8636 · 57ab2512

由 Easwar Hariharan 提交于 11月 02, 2015

Minor errors found via code inspection during future development.
SFF 8636 defines bit position 2 to hold the status indication of
QSFP memory paging. The mask used to test for the value was
incorrect and is fixed in this patch. Additionally, the dump
function had a mismatch between the field being printed out and
the field used to source the data which was fixed.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reported-by: NEaswar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: NEaswar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

57ab2512

IB/core: Fix user mode post wr corruption · 1d784b89

由 Mike Marciniszyn 提交于 12月 01, 2015

Commit e622f2f4 ("IB: split struct ib_send_wr")
introduced a regression for HCAs whose user mode post
sends go through ib_uverbs_post_send().

The code didn't account for the fact that the first sge is
offset by an operation dependent length.  The allocation did,
but the pointer to the destination sge list is computed without
that knowledge.  The sge list copy_from_user() then corrupts
fields in the work request

Store the operation dependent length in a local variable and
compute the sge list copy_from_user() destination using that length.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1d784b89

IB/qib: Fix qib_mr structure · 785f7422

由 Ira Weiny 提交于 11月 30, 2015

struct qib_mr requires the mr member be the last because struct
qib_mregion contains a dynamic array at the end.  The additions
of members should have been placed before this structure as the
comment noted.

Failure to do so was causing random memory corruption.  Reproducing
this bug was easy to do by running the client and server of
ib_write_bw -s 8 -n 5 on the same node.

This BUG() was tripped in a slab debug kernel:

kernel BUG at mm/slab.c:2572!

Fixes: 38071a46 ("IB/qib: Support the new memory registration API")
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

785f7422

lightnvm: do not compile in debugging by default · 41586244

由 Matias Bjørling 提交于 12月 06, 2015

The LightNVM module exposes a debug interface when CONFIG_NVM_DEBUG is
set. This interfaces takes a string to configure media managers and
targets. Make sure this interface is only exposed when chosen
deliberately.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

41586244

lightnvm: prevent gennvm module unload on use · 008b7443

由 Matias Bjørling 提交于 12月 06, 2015

After the gennvm module has been initialized. It might be attached to
one or several devices. In that case, the module is in use. Make sure
that it can not be unloaded.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

008b7443

lightnvm: fix media mgr registration · 762796bc

由 Matias Bjørling 提交于 12月 06, 2015

This patch fixes two issues during media manager registration.

1. The ppa pool can be used at media manager registration. Allocate the
ppa pool before that.

2. If a media manager can't be found, this should not lead to the
device being unallocated. A media manager can be registered later, that
can manage the device. Only warn if a media manager fails
initialization.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

762796bc

lightnvm: replace req queue with nvmdev for lld · 16f26c3a

由 Matias Bjørling 提交于 12月 06, 2015

In the case where a request queue is passed to the low lever lightnvm
device drive integration, the device driver might pass its admin
commands through another queue. Instead pass nvm_dev, and let the
low level drive the appropriate queue.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

16f26c3a

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功