提交 · 1fe480235ad7236e8ea6c167af5a5d1ac24f8a88 · openanolis / cloud-kernel

20 4月, 2015 1 次提交

rbd: be more informative on -ENOENT failures · 1fe48023

由 Ilya Dryomov 提交于 3月 05, 2015

pr_info what exactly was the culprit: missing pool, image or snap.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1fe48023

23 3月, 2015 1 次提交

NVMe: Initialize device list head before starting · e6e96d73

由 Keith Busch 提交于 3月 23, 2015

Driver recovery requires the device's list node to have been initialized.

Fixes: https://lkml.org/lkml/2015/3/22/262Reported-by: NSteven Noonan <steven@uplinklabs.net>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e6e96d73

05 3月, 2015 1 次提交

nbd: fix possible memory leak · ff6b8090

由 Sudip Mukherjee 提交于 1月 27, 2015

we have already allocated memory for nbd_dev, but we were not
releasing that memory and just returning the error value.
Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
Acked-by: NPaul Clements <Paul.Clements@SteelEye.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>

ff6b8090

01 3月, 2015 1 次提交

zram: use proper type to update max_used_pages · 2ea55a2c

由 Joonsoo Kim 提交于 2月 27, 2015

max_used_pages is defined as atomic_long_t so we need to use unsigned
long to keep temporary value for it rather than int which is smaller
than unsigned long in a 64 bit system.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ea55a2c

24 2月, 2015 1 次提交

NVMe: Fix for BLK_DEV_INTEGRITY not set · 52b68d7e

由 Keith Busch 提交于 2月 23, 2015

Need to define and use appropriate functions for when BLK_DEV_INTEGRITY
is not set.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

52b68d7e

20 2月, 2015 8 次提交

NVMe: Fix potential corruption on sync commands · 0c0f9b95

由 Keith Busch 提交于 2月 19, 2015

This makes all sync commands uninterruptible and schedules without timeout
so the controller either has to post a completion or the timeout recovery
fails the command. This fixes potential memory or data corruption from
a command timing out too early or woken by a signal. Previously any DMA
buffers mapped for that command would have been released even though we
don't know what the controller is planning to do with those addresses.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

0c0f9b95

NVMe: Remove unused variables · 48328518

由 Keith Busch 提交于 2月 19, 2015

We don't track queues in a llist, subscribe to hot-cpu notifications,
or internally retry commands. Delete the unused artifacts.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

48328518

NVMe: Fix scsi mode select llbaa setting · 9ac16938

由 Keith Busch 提交于 1月 09, 2015

It should be a logical bitwise AND, not conditional.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

9ac16938

NVMe: Fix potential corruption during shutdown · 07836e65

由 Keith Busch 提交于 2月 19, 2015

The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.

This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.
Reported-by: NNilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

07836e65

NVMe: Asynchronous controller probe · 2e1d8448

由 Keith Busch 提交于 2月 12, 2015

This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

2e1d8448

NVMe: Register management handle under nvme class · b3fffdef

由 Keith Busch 提交于 2月 03, 2015

This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.

Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

b3fffdef

NVMe: Update SCSI Inquiry VPD 83h translation · 4f1982b4

由 Keith Busch 提交于 2月 19, 2015

The original translation created collisions on Inquiry VPD 83 for many
existing devices. Newer specifications provide other ways to translate
based on the device's version can be used to create unique identifiers.

Version 1.1 provides an EUI64 field that uniquely identifies each
namespace, and 1.2 added the longer NGUID field for the same reason.
Both follow the IEEE EUI format and readily translate to the SCSI device
identification EUI designator type 2h. For devices implementing either,
the translation will use this type, defaulting to the EUI64 8-byte type if
implemented then NGUID's 16 byte version if not. If neither are provided,
the 1.0 translation is used, and is updated to use the SCSI String format
to guarantee a unique identifier.

Knowing when to use the new fields depends on the nvme controller's
revision. The NVME_VS macro was not decoding this correctly, so that is
fixed in this patch and moved to a more appropriate place.

Since the Identify Namespace structure required an update for the NGUID
field, this patch adds the remaining new 1.2 fields to the structure.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

4f1982b4

NVMe: Metadata format support · e1e5e564

由 Keith Busch 提交于 2月 19, 2015

Adds support for NVMe metadata formats and exposes block devices for
all namespaces regardless of their format. Namespace formats that are
unusable will have disk capacity set to 0, but a handle to the block
device is created to simplify device management. A namespace is not
usable when the format requires host interleave block and metadata in
single buffer, has no provisioned storage, or has better data but failed
to register with blk integrity.

The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.

The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one integrity segment is allowed per command. If the metadata
is used for T10 PI, the driver provides mappings to save and restore
the reftag physical block translation. The driver provides no-op
functions for generate and verify if metadata is not used for protection
information. This way the setup is always provided by the block layer.

If a request does not supply a required metadata buffer, the command
is failed with bad address. This could only happen if a user manually
disables verify/generate on such a disk. The only exception to where
this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.

The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

e1e5e564

19 2月, 2015 4 次提交

rbd: convert to blk-mq · 7ad18afa

由 Christoph Hellwig 提交于 1月 13, 2015

This converts the rbd driver to use the blk-mq infrastructure.  Except
for switching to a per-request work item this is almost mechanical.

This was tested by Alexandre DERUMIER in November, and found to give
him 120000 iops, although the only comparism available was an old
3.10 kernel which gave 80000iops.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <elder@linaro.org>
[idryomov@gmail.com: context, blk_mq_init_queue() EH]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7ad18afa

rbd: do not treat standalone as flatten · cf32bd9c

由 Ilya Dryomov 提交于 1月 19, 2015

If the clone is resized down to 0, it becomes standalone.  If such
resize is carried over while an image is mapped we would detect this
and call rbd_dev_parent_put() which means "let go of all parent state,
including the spec(s) of parent images(s)".  This leads to a mismatch
between "rbd info" and sysfs parent fields, so a fix is in order.

    # rbd create --image-format 2 --size 1 foo
    # rbd snap create foo@snap
    # rbd snap protect foo@snap
    # rbd clone foo@snap bar
    # DEV=$(rbd map bar)
    # rbd resize --allow-shrink --size 0 bar
    # rbd resize --size 1 bar
    # rbd info bar | grep parent
            parent: rbd/foo@snap

Before:

    # cat /sys/bus/rbd/devices/0/parent
    (no parent image)

After:

    # cat /sys/bus/rbd/devices/0/parent
    pool_id 0
    pool_name rbd
    image_id 10056b8b4567
    image_name foo
    snap_id 2
    snap_name snap
    overlap 0
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

cf32bd9c

rbd: fix error paths in rbd_dev_refresh() · 73e39e4d

由 Ilya Dryomov 提交于 1月 08, 2015

header_rwsem should be released on errors.  Also remove useless
rbd_dev->mapping.size != rbd_dev->header.image_size test.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

73e39e4d

rbd: nuke copy_token() · 3a25cf43

由 Rickard Strandqvist 提交于 1月 01, 2015

It's been largely superseded by dup_token() and unused for over
2 years, identified by cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[idryomov@redhat.com: changelog]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

3a25cf43

17 2月, 2015 1 次提交

brd: rename XIP to DAX · a7a97fc9

由 Matthew Wilcox 提交于 2月 16, 2015

Since this is relating to FS_XIP, not KERNEL_XIP, it should be called
DAX instead of XIP.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7a97fc9

13 2月, 2015 9 次提交

kernel.h: remove ancient __FUNCTION__ hack · 02f1f217

由 Rasmus Villemoes 提交于 2月 12, 2015

__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so
this only helps people who only test-compile using 3.3 (compiler-gcc3.h
barks at anything older than that).  Besides, there are almost no
occurrences of __FUNCTION__ left in the tree.

[akpm@linux-foundation.org: convert remaining __FUNCTION__ references]
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02f1f217

mm/zpool: add name argument to create zpool · 3eba0c6a

由 Ganesh Mahendran 提交于 2月 12, 2015

Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them.  There is not a method to let zsmalloc/zbud find which caller they
belong to.

Now we want to add statistics collection in zsmalloc.  We need to name the
debugfs dir for each pool created.  The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.

    /sys/kernel/debug/zsmalloc/zram0

This patch adds an argument `name' to zs_create_pool() and other related
functions.
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3eba0c6a

zram: remove request_queue from struct zram · ee980160

由 Sergey Senozhatsky 提交于 2月 12, 2015

`struct zram' contains both `struct gendisk' and `struct request_queue'.
the latter can be deleted, because zram->disk carries ->queue pointer, and
->queue carries zram pointer:

create_device()
	zram->queue->queuedata = zram
	zram->disk->queue = zram->queue
	zram->disk->private_data = zram

so zram->queue is not needed, we can access all necessary data anyway.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee980160

zram: remove init_lock in zram_make_request · 08eee69f

由 Minchan Kim 提交于 2月 12, 2015

Admin could reset zram during I/O operation going on so we have used
zram->init_lock as read-side lock in I/O path to prevent sudden zram
meta freeing.

However, the init_lock is really troublesome.  We can't do call
zram_meta_alloc under init_lock due to lockdep splat because
zram_rw_page is one of the function under reclaim path and hold it as
read_lock while other places in process context hold it as write_lock.
So, we have used allocation out of the lock to avoid lockdep warn but
it's not good for readability and fainally, I met another lockdep splat
between init_lock and cpu_hotplug from kmem_cache_destroy during working
zsmalloc compaction.  :(

Yes, the ideal is to remove horrible init_lock of zram in rw path.  This
patch removes it in rw path and instead, add atomic refcount for meta
lifetime management and completion to free meta in process context.
It's important to free meta in process context because some of resource
destruction needs mutex lock, which could be held if we releases the
resource in reclaim context so it's deadlock, again.

As a bonus, we could remove init_done check in rw path because
zram_meta_get will do a role for it, instead.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08eee69f

zram: check bd_openers instead of bd_holders · 2b269ce6

由 Minchan Kim 提交于 2月 12, 2015

bd_holders is increased only when user open the device file as FMODE_EXCL
so if something opens zram0 as !FMODE_EXCL and request I/O while another
user reset zram0, we can see following warning.

  zram0: detected capacity change from 0 to 64424509440
  Buffer I/O error on dev zram0, logical block 180823, lost async page write
  Buffer I/O error on dev zram0, logical block 180824, lost async page write
  Buffer I/O error on dev zram0, logical block 180825, lost async page write
  Buffer I/O error on dev zram0, logical block 180826, lost async page write
  Buffer I/O error on dev zram0, logical block 180827, lost async page write
  Buffer I/O error on dev zram0, logical block 180828, lost async page write
  Buffer I/O error on dev zram0, logical block 180829, lost async page write
  Buffer I/O error on dev zram0, logical block 180830, lost async page write
  Buffer I/O error on dev zram0, logical block 180831, lost async page write
  Buffer I/O error on dev zram0, logical block 180832, lost async page write
  ------------[ cut here ]------------
  WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210()
  Modules linked in:
  CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x45/0x57
    warn_slowpath_common+0x8a/0xc0
    warn_slowpath_null+0x1a/0x20
    __blkdev_put+0x1d7/0x210
    blkdev_put+0x50/0x130
    blkdev_close+0x25/0x30
    __fput+0xdf/0x1e0
    ____fput+0xe/0x10
    task_work_run+0xa7/0xe0
    do_notify_resume+0x49/0x60
    int_signal+0x12/0x17
  ---[ end trace 274fbbc5664827d2 ]---

The warning comes from bdev_write_node in blkdev_put path.

   static void bdev_write_inode(struct inode *inode)
   {
        spin_lock(&inode->i_lock);
        while (inode->i_state & I_DIRTY) {
                spin_unlock(&inode->i_lock);
                WARN_ON_ONCE(write_inode_now(inode, true)); <========= here.
                spin_lock(&inode->i_lock);
        }
        spin_unlock(&inode->i_lock);
   }

The reason is dd process encounters I/O fails due to sudden block device
disappear so in filemap_check_errors in __writeback_single_inode returns
-EIO.

If we check bd_openers instead of bd_holders, we could address the
problem.  When I see the brd, it already have used it rather than
bd_holders so although I'm not a expert of block layer, it seems to be
better.

I can make following warning with below simple script.  In addition, I
added msleep(2000) below set_capacity(zram->disk, 0) after applying your
patch to make window huge(Kudos to Ganesh!)

script:

   echo $((60<<30)) > /sys/block/zram0/disksize
   setsid dd if=/dev/zero of=/dev/zram0 &
   sleep 1
   setsid echo 1 > /sys/block/zram0/reset
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b269ce6

zram: rework reset and destroy path · a096cafc

由 Sergey Senozhatsky 提交于 2月 12, 2015

We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran.  Potentially, we can
race set_capacity() calls from init and reset paths.

The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device().  This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e.  disksize).

Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.

So, this patch rework and cleanup destroy path.

1) remove several unneeded goto labels in zram_init()

2) factor out zram_init() error path and zram_exit() into
   destroy_devices() function, which takes the number of devices to
   destroy as its argument.

3) remove sysfs group in destroy_devices() first, so we can reorder
   operations -- reset device (as expected) goes before disk destroy and
   queue cleanup.  So we can always access ->disk in zram_reset_device().

4) and, finally, return set_capacity() back under ->init_lock.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a096cafc

zram: fix umount-reset_store-mount race condition · ba6b17d6

由 Sergey Senozhatsky 提交于 2月 12, 2015

Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to
avoid ->bd_holders race condition:

        CPU0                            CPU1
umount /* zram->init_done is true */
reset_store()
bdev->bd_holders == 0                   mount
...                                     zram_make_request()
zram_reset_device()

However, his solution required some considerable amount of code movement,
which we can avoid.

Apart from using bdev->bd_mutex in reset_store(), this patch also
simplifies zram_reset_device().

zram_reset_device() has a bool parameter reset_capacity which tells it
whether disk capacity and itself disk should be reset.  There are two
zram_reset_device() callers:

-- zram_exit() passes reset_capacity=false
-- reset_store() passes reset_capacity=true

So we can move reset_capacity-sensitive work out of zram_reset_device()
and perform it unconditionally in reset_store().  This also lets us drop
reset_capacity parameter from zram_reset_device() and pass zram pointer
only.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba6b17d6

zram: free meta table in zram_meta_free · 1fec1172

由 Ganesh Mahendran 提交于 2月 12, 2015

zram_meta_alloc() and zram_meta_free() are a pair.  In
zram_meta_alloc(), meta table is allocated.  So it it better to free it
in zram_meta_free().
Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1fec1172

zram: clean up zram_meta_alloc() · b8179958

由 Sergey Senozhatsky 提交于 2月 12, 2015

A trivial cleanup of zram_meta_alloc() error handling.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8179958

11 2月, 2015 2 次提交

xen-blkback: default to X86_32 ABI on x86 · b042a3ca

由 David Vrabel 提交于 2月 05, 2015

Prior to the existance of 64-bit backends using the X86_64 ABI,
frontends used the X86_32 ABI.  These old frontends do not specify the
ABI and when used with a 64-bit backend do not work.

On x86, default to the X86_32 ABI if one is not specified.  Backends
on ARM continue to default to their NATIVE ABI.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>

b042a3ca

xen-blkfront: fix accounting of reqs when migrating · 3bb8c98e

由 Roger Pau Monne 提交于 2月 02, 2015

Current migration code uses blk_put_request in order to finish a request
before requeuing it. This function doesn't update the statistics of the
queue, which completely screws accounting. Use blk_end_request_all instead
which properly updates the statistics of the queue.
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Reported-and-Tested-by: NOuyang Zhaowei (Charles) <ouyangzhaowei@huawei.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org

3bb8c98e

03 2月, 2015 1 次提交

floppy: Avoid manual call of device_create_file() · b7f120b2

由 Takashi Iwai 提交于 2月 02, 2015

Use the static attribute groups assigned to the device instead of
calling device_create_file() after the device registration.
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

b7f120b2

30 1月, 2015 1 次提交

NVMe: avoid kmalloc/kfree for smaller IO · ac3dd5bd

由 Jens Axboe 提交于 1月 22, 2015

Currently we allocate an nvme_iod for each IO, which holds the
sg list, prps, and other IO related info. Set a threshold of
2 pages and/or 8KB of data, below which we can just embed this
in the per-command pdu in blk-mq. For any IO at or below
NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree.

For higher IOPS, this saves up to 1% of CPU time.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

ac3dd5bd

28 1月, 2015 4 次提交

xen-blkback: safely unmap grants in case they are still in use · c43cf3ea

由 Jennifer Herbert 提交于 1月 05, 2015

Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.

This allows blkback to use network storage which may retain refs to
pages in queued skbs after the block I/O has completed.
Signed-off-by: NJennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Acked-by: NJens Axboe <axboe@kernel.de>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

c43cf3ea

xen/grant-table: add helpers for allocating pages · ff4b156f

由 David Vrabel 提交于 1月 08, 2015

Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
suitable to for granted maps.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

ff4b156f

rbd: drop parent_ref in rbd_dev_unprobe() unconditionally · e69b8d41

由 Ilya Dryomov 提交于 1月 19, 2015

This effectively reverts the last hunk of 392a9dad ("rbd: detect
when clone image is flattened").

The problem with parent_overlap != 0 condition is that it's possible
and completely valid to have an image with parent_overlap == 0 whose
parent state needs to be cleaned up on unmap.  The next commit, which
drops the "clone image now standalone" logic, opens up another window
of opportunity to hit this, but even without it

    # cat parent-ref.sh
    #!/bin/bash
    rbd create --image-format 2 --size 1 foo
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    rbd resize --allow-shrink --size 0 bar
    rbd resize --size 1 bar
    DEV=$(rbd map bar)
    rbd unmap $DEV

leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client
hanging around.

My thinking behind calling rbd_dev_parent_put() unconditionally is that
there shouldn't be any requests in flight at that point in time as we
are deep into unmap sequence.  Hence, even if rbd_dev_unparent() caused
by flatten is delayed by in-flight requests, it will have finished by
the time we reach rbd_dev_unprobe() caused by unmap, thus turning
unconditional rbd_dev_parent_put() into a no-op.

Fixes: http://tracker.ceph.com/issues/10352

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e69b8d41

rbd: fix rbd_dev_parent_get() when parent_overlap == 0 · ae43e9d0

由 Ilya Dryomov 提交于 1月 19, 2015

The comment for rbd_dev_parent_get() said

    * We must get the reference before checking for the overlap to
    * coordinate properly with zeroing the parent overlap in
    * rbd_dev_v2_parent_info() when an image gets flattened.  We
    * drop it again if there is no overlap.

but the "drop it again if there is no overlap" part was missing from
the implementation.  This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.

Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

ae43e9d0

24 1月, 2015 1 次提交

block: support different tag allocation policy · ee1b6f7a

由 Shaohua Li 提交于 1月 15, 2015

The libata tag allocation is using a round-robin policy. Next patch will
make libata use block generic tag allocation, so let's add a policy to
tag allocation.

Currently two policies: FIFO (default) and round-robin.

Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ee1b6f7a

22 1月, 2015 2 次提交

NVMe: within nvme_free_queues(), delete RCU sychro/deferred free · 121c7ad4

由 kaoudis 提交于 1月 14, 2015

Converting from to blk-queue got rid of the driver's RCU
locking-on-queue, so removing unnecessary RCU locking-on-queue
artefacts.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NKelly Nicole Kaoudis <kaoudis@colorado.edu>
Signed-off-by: NJens Axboe <axboe@fb.com>

121c7ad4

block: Add discard flag to blkdev_issue_zeroout() function · d93ba7a5

由 Martin K. Petersen 提交于 1月 20, 2015

blkdev_issue_discard() will zero a given block range. This is done by
way of explicit writing, thus provisioning or allocating the blocks on
disk.

There are use cases where the desired behavior is to zero the blocks but
unprovision them if possible. The blocks must deterministically contain
zeroes when they are subsequently read back.

This patch adds a flag to blkdev_issue_zeroout() that provides this
variant. If the discard flag is set and a block device guarantees
discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
the device does not support discard_zeroes_data or if the discard
request fails we will fall back to first REQ_WRITE_SAME and then a
regular REQ_WRITE.

Also update the callers of blkdev_issue_zero() to reflect the new flag
and make sb_issue_zeroout() prefer the discard approach.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d93ba7a5

21 1月, 2015 2 次提交

virtio_blk: coding style fixes · bb6ec576

由 Michael S. Tsirkin 提交于 1月 15, 2015

Most of our code has
struct foo {
}

Fix two instances where blk is inconsistent.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

bb6ec576

virtio/blk: verify device has config space · a4379fd8

由 Michael S. Tsirkin 提交于 1月 12, 2015

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/blk needs config space access so make it
fail gracefully if not there.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a4379fd8

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功