提交 · 08eee69fcf6baea543a2b4d2a2fcba0e61aa3160 · openeuler / Kernel

13 2月, 2015 6 次提交

zram: remove init_lock in zram_make_request · 08eee69f

由 Minchan Kim 提交于 2月 12, 2015

Admin could reset zram during I/O operation going on so we have used
zram->init_lock as read-side lock in I/O path to prevent sudden zram
meta freeing.

However, the init_lock is really troublesome.  We can't do call
zram_meta_alloc under init_lock due to lockdep splat because
zram_rw_page is one of the function under reclaim path and hold it as
read_lock while other places in process context hold it as write_lock.
So, we have used allocation out of the lock to avoid lockdep warn but
it's not good for readability and fainally, I met another lockdep splat
between init_lock and cpu_hotplug from kmem_cache_destroy during working
zsmalloc compaction.  :(

Yes, the ideal is to remove horrible init_lock of zram in rw path.  This
patch removes it in rw path and instead, add atomic refcount for meta
lifetime management and completion to free meta in process context.
It's important to free meta in process context because some of resource
destruction needs mutex lock, which could be held if we releases the
resource in reclaim context so it's deadlock, again.

As a bonus, we could remove init_done check in rw path because
zram_meta_get will do a role for it, instead.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08eee69f

zram: check bd_openers instead of bd_holders · 2b269ce6

由 Minchan Kim 提交于 2月 12, 2015

bd_holders is increased only when user open the device file as FMODE_EXCL
so if something opens zram0 as !FMODE_EXCL and request I/O while another
user reset zram0, we can see following warning.

  zram0: detected capacity change from 0 to 64424509440
  Buffer I/O error on dev zram0, logical block 180823, lost async page write
  Buffer I/O error on dev zram0, logical block 180824, lost async page write
  Buffer I/O error on dev zram0, logical block 180825, lost async page write
  Buffer I/O error on dev zram0, logical block 180826, lost async page write
  Buffer I/O error on dev zram0, logical block 180827, lost async page write
  Buffer I/O error on dev zram0, logical block 180828, lost async page write
  Buffer I/O error on dev zram0, logical block 180829, lost async page write
  Buffer I/O error on dev zram0, logical block 180830, lost async page write
  Buffer I/O error on dev zram0, logical block 180831, lost async page write
  Buffer I/O error on dev zram0, logical block 180832, lost async page write
  ------------[ cut here ]------------
  WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210()
  Modules linked in:
  CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x45/0x57
    warn_slowpath_common+0x8a/0xc0
    warn_slowpath_null+0x1a/0x20
    __blkdev_put+0x1d7/0x210
    blkdev_put+0x50/0x130
    blkdev_close+0x25/0x30
    __fput+0xdf/0x1e0
    ____fput+0xe/0x10
    task_work_run+0xa7/0xe0
    do_notify_resume+0x49/0x60
    int_signal+0x12/0x17
  ---[ end trace 274fbbc5664827d2 ]---

The warning comes from bdev_write_node in blkdev_put path.

   static void bdev_write_inode(struct inode *inode)
   {
        spin_lock(&inode->i_lock);
        while (inode->i_state & I_DIRTY) {
                spin_unlock(&inode->i_lock);
                WARN_ON_ONCE(write_inode_now(inode, true)); <========= here.
                spin_lock(&inode->i_lock);
        }
        spin_unlock(&inode->i_lock);
   }

The reason is dd process encounters I/O fails due to sudden block device
disappear so in filemap_check_errors in __writeback_single_inode returns
-EIO.

If we check bd_openers instead of bd_holders, we could address the
problem.  When I see the brd, it already have used it rather than
bd_holders so although I'm not a expert of block layer, it seems to be
better.

I can make following warning with below simple script.  In addition, I
added msleep(2000) below set_capacity(zram->disk, 0) after applying your
patch to make window huge(Kudos to Ganesh!)

script:

   echo $((60<<30)) > /sys/block/zram0/disksize
   setsid dd if=/dev/zero of=/dev/zram0 &
   sleep 1
   setsid echo 1 > /sys/block/zram0/reset
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b269ce6

zram: rework reset and destroy path · a096cafc

由 Sergey Senozhatsky 提交于 2月 12, 2015

We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran.  Potentially, we can
race set_capacity() calls from init and reset paths.

The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device().  This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e.  disksize).

Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.

So, this patch rework and cleanup destroy path.

1) remove several unneeded goto labels in zram_init()

2) factor out zram_init() error path and zram_exit() into
   destroy_devices() function, which takes the number of devices to
   destroy as its argument.

3) remove sysfs group in destroy_devices() first, so we can reorder
   operations -- reset device (as expected) goes before disk destroy and
   queue cleanup.  So we can always access ->disk in zram_reset_device().

4) and, finally, return set_capacity() back under ->init_lock.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a096cafc

zram: fix umount-reset_store-mount race condition · ba6b17d6

由 Sergey Senozhatsky 提交于 2月 12, 2015

Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to
avoid ->bd_holders race condition:

        CPU0                            CPU1
umount /* zram->init_done is true */
reset_store()
bdev->bd_holders == 0                   mount
...                                     zram_make_request()
zram_reset_device()

However, his solution required some considerable amount of code movement,
which we can avoid.

Apart from using bdev->bd_mutex in reset_store(), this patch also
simplifies zram_reset_device().

zram_reset_device() has a bool parameter reset_capacity which tells it
whether disk capacity and itself disk should be reset.  There are two
zram_reset_device() callers:

-- zram_exit() passes reset_capacity=false
-- reset_store() passes reset_capacity=true

So we can move reset_capacity-sensitive work out of zram_reset_device()
and perform it unconditionally in reset_store().  This also lets us drop
reset_capacity parameter from zram_reset_device() and pass zram pointer
only.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba6b17d6

zram: free meta table in zram_meta_free · 1fec1172

由 Ganesh Mahendran 提交于 2月 12, 2015

zram_meta_alloc() and zram_meta_free() are a pair.  In
zram_meta_alloc(), meta table is allocated.  So it it better to free it
in zram_meta_free().
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1fec1172

zram: clean up zram_meta_alloc() · b8179958

由 Sergey Senozhatsky 提交于 2月 12, 2015

A trivial cleanup of zram_meta_alloc() error handling.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8179958

28 1月, 2015 4 次提交

xen-blkback: safely unmap grants in case they are still in use · c43cf3ea

由 Jennifer Herbert 提交于 1月 05, 2015

Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.

This allows blkback to use network storage which may retain refs to
pages in queued skbs after the block I/O has completed.
Signed-off-by: NJennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Acked-by: NJens Axboe <axboe@kernel.de>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

c43cf3ea

xen/grant-table: add helpers for allocating pages · ff4b156f

由 David Vrabel 提交于 1月 08, 2015

Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
suitable to for granted maps.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

ff4b156f

rbd: drop parent_ref in rbd_dev_unprobe() unconditionally · e69b8d41

由 Ilya Dryomov 提交于 1月 19, 2015

This effectively reverts the last hunk of 392a9dad ("rbd: detect
when clone image is flattened").

The problem with parent_overlap != 0 condition is that it's possible
and completely valid to have an image with parent_overlap == 0 whose
parent state needs to be cleaned up on unmap.  The next commit, which
drops the "clone image now standalone" logic, opens up another window
of opportunity to hit this, but even without it

    # cat parent-ref.sh
    #!/bin/bash
    rbd create --image-format 2 --size 1 foo
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    rbd resize --allow-shrink --size 0 bar
    rbd resize --size 1 bar
    DEV=$(rbd map bar)
    rbd unmap $DEV

leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client
hanging around.

My thinking behind calling rbd_dev_parent_put() unconditionally is that
there shouldn't be any requests in flight at that point in time as we
are deep into unmap sequence.  Hence, even if rbd_dev_unparent() caused
by flatten is delayed by in-flight requests, it will have finished by
the time we reach rbd_dev_unprobe() caused by unmap, thus turning
unconditional rbd_dev_parent_put() into a no-op.

Fixes: http://tracker.ceph.com/issues/10352

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e69b8d41

rbd: fix rbd_dev_parent_get() when parent_overlap == 0 · ae43e9d0

由 Ilya Dryomov 提交于 1月 19, 2015

The comment for rbd_dev_parent_get() said

    * We must get the reference before checking for the overlap to
    * coordinate properly with zeroing the parent overlap in
    * rbd_dev_v2_parent_info() when an image gets flattened.  We
    * drop it again if there is no overlap.

but the "drop it again if there is no overlap" part was missing from
the implementation.  This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.

Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

ae43e9d0

16 1月, 2015 1 次提交

NVMe: cq_vector should be signed · 6222d172

由 Jens Axboe 提交于 1月 15, 2015

This was inadvertently dropped from an earlier commit, otherwise
the check against cq_vector == -1 to prevent double free doesn't
make any sense.

Fixes: 2b25d981Signed-off-by: NJens Axboe <axboe@fb.com>

6222d172

09 1月, 2015 6 次提交

NVMe: Fix locking on abort handling · 7a509a6b

由 Keith Busch 提交于 1月 07, 2015

The queues and device need to be locked when messing with them.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7a509a6b

NVMe: Start and stop h/w queues on reset · c9d3bf88

由 Keith Busch 提交于 1月 07, 2015

This freezes and stops all the queues on device shutdown and restarts
them on resume. This fixes hotplug and reset issues when the controller
is actively being used.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c9d3bf88

NVMe: Command abort handling fixes · cef6a948

由 Keith Busch 提交于 1月 07, 2015

Aborts all requeued commands prior to killing the request_queue. For
commands that time out on a dying request queue, set the "Do Not Retry"
bit on the command status so the command cannot be requeued. Finanally, if
the driver is requested to abort a command it did not start, do nothing.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

cef6a948

NVMe: Admin queue removal handling · 0fb59cbc

由 Keith Busch 提交于 1月 07, 2015

This protects admin queue access on shutdown. When the controller is
disabled, the queue is frozen to prevent new entry, and unfrozen on
resume, and fixes cq_vector signedness to not suspend a queue twice.

Since unfreezing the queue makes it available for commands, it requires
the queue be initialized, so this moves this part after that.

Special handling is done when the device is unresponsive during
shutdown. This can be optimized to not require subsequent commands to
timeout, but saving that fix for later.

This patch also removes the kill signals in this path that were left-over
artifacts from the blk-mq conversion and no longer necessary.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0fb59cbc

NVMe: Reference count admin queue usage · ea191d2f

由 Keith Busch 提交于 1月 07, 2015

Since there is no gendisk associated with the admin queue, the driver
needs to hold a reference to it until all open references to the
controller are closed.

This also combines queue cleanup with freeing the tag set since these
should not be separate.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ea191d2f

NVMe: Start all requests · c917dfe5

由 Keith Busch 提交于 1月 07, 2015

Once the nvme callback is set for a request, the driver can start it
and make it available for timeout handling. For timed out commands on a
device that is not initialized, this fixes potential deadlocks that can
occur on startup and shutdown when a device is unresponsive since they
can now be cancelled.

Asynchronous requests do not have any expected timeout, so these are
using the new "REQ_NO_TIMEOUT" request flags.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c917dfe5

03 1月, 2015 1 次提交

block: fix checking return value of blk_mq_init_queue · 35b489d3

由 Ming Lei 提交于 1月 02, 2015

Check IS_ERR_OR_NULL(return value) instead of just return value.
Signed-off-by: NMing Lei <ming.lei@canonical.com>

Reduced to IS_ERR() by me, we never return NULL.
Signed-off-by: NJens Axboe <axboe@fb.com>

35b489d3

23 12月, 2014 1 次提交

NVMe: Fix double free irq · 2b25d981

由 Keith Busch 提交于 12月 22, 2014

Sets the vector to an invalid value after it's freed so we don't free
it twice.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2b25d981

18 12月, 2014 2 次提交

rbd: don't treat CEPH_OSD_OP_DELETE as extent op · 7e868b6e

由 Ilya Dryomov 提交于 11月 21, 2014

CEPH_OSD_OP_DELETE is not an extent op, stop treating it as such.  This
sneaked in with discard patches - it's one of the three osd ops (the
other two are CEPH_OSD_OP_TRUNCATE and CEPH_OSD_OP_ZERO) that discard
is implemented with.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7e868b6e

ceph, rbd: delete unnecessary checks before two function calls · e96a650a

由 SF Markus Elfring 提交于 11月 02, 2014

The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
[idryomov@redhat.com: squashed rbd.c hunk, changelog]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

e96a650a

14 12月, 2014 5 次提交

zram: use DEVICE_ATTR_[RW|RO|WO] to define zram sys device attribute · 083914ea

由 Ganesh Mahendran 提交于 12月 12, 2014

In current zram, we use DEVICE_ATTR() to define sys device attributes.
SO, we need to set (S_IRUGO | S_IWUSR) permission and other arguments
manually.  Linux already provids the macro DEVICE_ATTR_[RW|RO|WO] to
define sys device attribute.  It is simple and readable.

This patch uses kernel defined macro DEVICE_ATTR_[RW|RO|WO] to define
zram device attribute.
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

083914ea

mm/zram: correct ZRAM_ZERO flag bit position · d49b1c25

由 Mahendran Ganesh 提交于 12月 12, 2014

In struct zram_table_entry, the element *value* contains obj size and obj
zram flags.  Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and
bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj
zram_flags.  So the first zram flag(ZRAM_ZERO) should be from
ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1).

This patch fixes this cosmetic issue.

Also fix a typo, "page in now accessed" -> "page is now accessed"
Signed-off-by: NMahendran Ganesh <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NWeijie Yang <weijie.yang@samsung.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d49b1c25

zram: implement rw_page operation of zram · 8c7f0102

由 karam.lee 提交于 12月 12, 2014

This patch implements rw_page operation for zram block device.

I implemented the feature in zram and tested it.  Test bed was the G2, LG
electronic mobile device, whtich has msm8974 processor and 2GB memory.

With a memory allocation test program consuming memory, the system
generates swap.

Operating time of swap_write_page() was measured.

--------------------------------------------------
|             |   operating time   | improvement |
|             |  (20 runs average) |             |
--------------------------------------------------
|with patch   |    1061.15 us      |    +2.4%    |
--------------------------------------------------
|without patch|    1087.35 us      |             |
--------------------------------------------------

Each test(with paged_io,with BIO) result set shows normal distribution and
has equal variance.  I mean the two values are valid result to compare.  I
can say operation with paged I/O(without BIO) is faster 2.4% with
confidence level 95%.

[minchan@kernel.org: make rw_page opeartion return 0]
[minchan@kernel.org: rely on the bi_end_io for zram_rw_page fails]
[sergey.senozhatsky@gmail.com: code cleanup]
[minchan@kernel.org: add comment]
Signed-off-by: Nkaram.lee <karam.lee@lge.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c7f0102

zram: change parameter from vaild_io_request() · 54850e73

由 karam.lee 提交于 12月 12, 2014

This patch changes parameter of valid_io_request for common usage.  The
purpose of valid_io_request() is to determine if bio request is valid or
not.

This patch use I/O start address and size instead of a BIO parameter for
common usage.
Signed-off-by: Nkaram.lee <karam.lee@lge.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

54850e73

zram: remove bio parameter from zram_bvec_rw() · b627cff3

由 karam.lee 提交于 12月 12, 2014

Recently rw_page block device operation has been added.  This patchset
implements rw_page operation for zram block device and does some clean-up.

This patch (of 3):

Remove an unnecessary parameter(bio) from zram_bvec_rw() and
zram_bvec_read().  zram_bvec_read() doesn't use a bio parameter, so remove
it.  zram_bvec_rw() calls a read/write operation not using bio, so a rw
parameter replaces a bio parameter.
Signed-off-by: Nkaram.lee <karam.lee@lge.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b627cff3

12 12月, 2014 5 次提交

NVMe: fix race condition in nvme_submit_sync_cmd() · 849c6e77

由 Jens Axboe 提交于 12月 12, 2014

If we have a race between the schedule timing out and the command
completing, we could have the task issuing the command exit
nvme_submit_sync_cmd() while the irq is running sync_completion().
If that happens, we could be corrupting memory, since the stack
that held 'cmdinfo' is no longer valid.

Fix this by always calling nvme_abort_cmd_info(). Once that call
completes, we know that we have either run sync_completion() if
the completion came in, or that we will never run it since we now
have special_completion() as the command callback handler.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

849c6e77

sunvdc: reconnect ldc after vds service domain restarts · 76e74bbe

由 Dwight Engen 提交于 12月 11, 2014

This change enables the sunvdc driver to reconnect and recover if a vds
service domain is disconnected or bounced.

By default, it will wait indefinitely for the service domain to become
available again, but will honor a non-zero vdc-timout md property if one
is set. If a timeout is reached, any in-progress I/O's are completed
with -EIO.
Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
Reviewed-by: NChris Hyser <chris.hyser@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76e74bbe

vio: create routines for inc,dec vio dring indexes · fe47c3c2

由 Dwight Engen 提交于 12月 11, 2014

Both sunvdc and sunvnet implemented distinct functionality for incrementing
and decrementing dring indexes. Create common functions for use by both
from the sunvnet versions, which were chosen since they will still work
correctly in case a non power of two ring size is used.
Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe47c3c2

sunvdc: fix module unload/reload · 31f4888f

由 Dwight Engen 提交于 12月 11, 2014

Free resources allocated during port/disk probing so that the module may be
successfully reloaded after unloading.
Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31f4888f

NVMe: fix retry/error logic in nvme_queue_rq() · fe54303e

由 Jens Axboe 提交于 12月 11, 2014

The logic around retrying and erroring IO in nvme_queue_rq() is broken
in a few ways:

- If we fail allocating dma memory for a discard, we return retry. We
  have the 'iod' stored in ->special, but we free the 'iod'.

- For a normal request, if we fail dma mapping of setting up prps, we
  have the same iod situation. Additionally, we haven't set the callback
  for the request yet, so we also potentially leak IOMMU resources.

Get rid of the ->special 'iod' store. The retry is uncommon enough that
it's not worth optimizing for or holding on to resources to attempt to
speed it up. Additionally, it's usually best practice to free any
request related resources when doing retries.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

fe54303e

11 12月, 2014 5 次提交

NVMe: Fix FS mount issue (hot-remove followed by hot-add) · 285dffc9

由 Indraneel M 提交于 12月 11, 2014

After Hot-remove of a device with a mounted partition,
when the device is hot-added again, the new node reappears
as nvme0n1. Mounting this new node fails with the error:

mount: mount /dev/nvme0n1p1 on /mnt failed: File exists.

The old nodes's FS entries still exist and the kernel can't re-create
procfs and sysfs entries for the new node with the same name.
The patch fixes this issue.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NIndraneel M <indraneel.m@samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

285dffc9

NVMe: fix error return checking from blk_mq_alloc_request() · 97fe3832

由 Jens Axboe 提交于 12月 10, 2014

We return an error pointer or the request, not NULL. Half
the call paths got it right, the others didn't. Fix those up.
Signed-off-by: NJens Axboe <axboe@fb.com>

97fe3832

NVMe: fix freeing of wrong request in abort path · c87fd540

由 Sam Bradshaw 提交于 12月 10, 2014

We allocate 'abort_req', but free 'req' in case of an error
submitting the IO.
Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c87fd540

xen/blkfront: remove redundant flush_op · fdf9b965

由 Vitaly Kuznetsov 提交于 12月 09, 2014

flush_op is unambiguously defined by feature_flush:
    REQ_FUA | REQ_FLUSH -> BLKIF_OP_WRITE_BARRIER
    REQ_FLUSH -> BLKIF_OP_FLUSH_DISKCACHE
    0 -> 0
and thus can be removed. This is just a cleanup.

The patch was suggested by Boris Ostrovsky.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fdf9b965

xen/blkfront: improve protection against issuing unsupported REQ_FUA · ad42d391

由 Vitaly Kuznetsov 提交于 12月 01, 2014

Guard against issuing unsupported REQ_FUA and REQ_FLUSH was introduced
in d11e6158 and was factored out into blkif_request_flush_valid() in
0f1ca65e. However:
1) This check in incomplete. In case we negotiated to feature_flush = REQ_FLUSH
   and flush_op = BLKIF_OP_FLUSH_DISKCACHE (so FUA is unsupported) FUA request
   will still pass the check.
2) blkif_request_flush_valid() is misnamed. It is bool but returns true when
   the request is invalid.
3) When blkif_request_flush_valid() fails -EIO is being returned. It seems that
   -EOPNOTSUPP is more appropriate here.
Fix all of the above issues.

This patch is based on the original patch by Laszlo Ersek and a comment by
Jeff Moyer.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ad42d391

09 12月, 2014 4 次提交

virtio: drop VIRTIO_F_VERSION_1 from drivers · 51cdc381

由 Michael S. Tsirkin 提交于 12月 01, 2014

Core activates this bit automatically now,
drop it from drivers that set it explicitly.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

51cdc381

virtio_blk: fix race at module removal · 38f37b57

由 Michael S. Tsirkin 提交于 10月 23, 2014

If a device appears while module is being removed,
driver will get a callback after we've given up
on the major number.

In theory this means this major number can get reused
by something else, resulting in a conflict.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>

38f37b57

virtio_blk: make serial attribute static · 393c525b

由 Michael S. Tsirkin 提交于 10月 23, 2014

It's never declared so no need to make it extern.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>

393c525b

virtio_blk: v1.0 support · 19c1c5a6

由 Michael S. Tsirkin 提交于 10月 07, 2014

Based on patch by Cornelia Huck.

Note: for consistency, and to avoid sparse errors,
      convert all fields, even those no longer in use
      for virtio v1.0.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

19c1c5a6

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功