提交 · 907c3eb18e0bd86ca12a9de80befe8e3647bac3e · openeuler / Kernel

20 8月, 2015 1 次提交

xen-blkfront: convert to blk-mq APIs · 907c3eb1

由 Bob Liu 提交于 7月 13, 2015

Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().
Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

907c3eb1

15 8月, 2015 1 次提交

zram: fix pool name truncation · 4ce321f5

由 Sergey Senozhatsky 提交于 8月 14, 2015

zram_meta_alloc() constructs a pool name for zs_create_pool() call as

    snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);

However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.

With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:

  debugfs dir <zram100> creation failed
  zram: Error creating memory pool

... and so on.

Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id.  We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ce321f5

31 7月, 2015 1 次提交

rbd: fix copyup completion race · 2761713d

由 Ilya Dryomov 提交于 7月 16, 2015

For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.

rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run.  Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:

<obj_request-1/2 reply>
handle_reply()
  rbd_osd_req_callback()
    rbd_osd_trivial_callback()
    rbd_obj_request_complete()
    rbd_img_obj_copyup_callback()
    rbd_img_obj_callback()
                                    <obj_request-2/2 reply>
                                    handle_reply()
                                      rbd_osd_req_callback()
                                        rbd_osd_trivial_callback()
      for_each_obj_request(obj_request->img_request) {
        rbd_img_obj_end_request(obj_request-1/2)
        rbd_img_obj_end_request(obj_request-2/2) <--
      }

Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0.  We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on

    rbd_assert(more ^ (which == img_request->obj_request_count));

with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.

To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request).  So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().

Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().

Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

2761713d

24 7月, 2015 3 次提交

xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() · 53bc7dc0

由 Bob Liu 提交于 7月 22, 2015

The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.

There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.

CC: stable@vger.kernel.org
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

53bc7dc0

xen-blkfront: don't add indirect pages to list when !feature_persistent · 7b076750

由 Bob Liu 提交于 7月 22, 2015

We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.

When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.

CC: stable@vger.kernel.org
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

7b076750

xen-blkfront: introduce blkfront_gather_backend_features() · d50babbe

由 Bob Liu 提交于 7月 22, 2015

There is a bug when migrate from !feature-persistent host to feature-persistent
host, because domU still thinks new host/backend doesn't support persistent.
Dmesg like:
backed has not unmapped grant: 839
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 773
backed has not unmapped grant: 839

The fix is to recheck feature-persistent of new backend in blkif_recover().
See: https://lkml.org/lkml/2015/5/25/469

As Roger suggested, we can split the part of blkfront_connect that checks for
optional features, like persistent grants, indirect descriptors and
flush/barrier features to a separate function and call it from both
blkfront_connect and blkif_recover
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d50babbe

23 7月, 2015 1 次提交

null_blk: fix use-after-free problem · 21974061

由 Mike Krinkin 提交于 7月 19, 2015

end_cmd finishes request associated with nullb_cmd struct, so we
should save pointer to request_queue in a local variable before
calling end_cmd.

The problem was causes general protection fault with slab poisoning
enabled.

Fixes: 8b70f45e ("null_blk: restart request processing on completion handler")
Tested-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

21974061

16 7月, 2015 1 次提交

NVMe: Reread partitions on metadata formats · 7bee6074

由 Keith Busch 提交于 7月 14, 2015

This patch has the driver automatically reread partitions if a namespace
has a separate metadata format. Previously revalidating a disk was
sufficient to get the correct capacity set on such formatted drives,
but partitions that may exist would not have been surfaced.
Reported-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Tested-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7bee6074

02 7月, 2015 1 次提交

NVMe: Fix irq freeing when queue_request_irq fails · 758dd7fd

由 Jon Derrick 提交于 6月 30, 2015

Fixes an issue when queue_reuest_irq fails in nvme_setup_io_queues. This
patch initializes all vectors to -1 and resets the vector to -1 in the
case of a failure in queue_request_irq. This avoids the free_irq in
nvme_suspend_queue if the queue did not get an irq.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

758dd7fd

01 7月, 2015 1 次提交

rbd: use GFP_NOIO in rbd_obj_request_create() · 5a60e876

由 Ilya Dryomov 提交于 6月 24, 2015

rbd_obj_request_create() is called on the main I/O path, so we need to
use GFP_NOIO to make sure allocation doesn't blow back on us.  Not all
callers need this, but I'm still hardcoding the flag inside rather than
making it a parameter because a) this is going to stable, and b) those
callers shouldn't really use rbd_obj_request_create() and will be fixed
in the future.

More memory allocation fixes will follow.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

5a60e876

28 6月, 2015 6 次提交

drivers/block/nvme-core.c: fix build with gcc-4.4.4 · e44ac588

由 Andrew Morton 提交于 6月 27, 2015

gcc-4.4.4 (and possibly other versions) fail the compile when initializers
are used with anonymous unions.  Work around this.

drivers/block/nvme-core.c: In function 'nvme_identify_ctrl':
drivers/block/nvme-core.c:1163: error: unknown field 'identify' specified in initializer
drivers/block/nvme-core.c:1163: warning: missing braces around initializer
drivers/block/nvme-core.c:1163: warning: (near initialization for 'c.<anonymous>')
drivers/block/nvme-core.c:1164: error: unknown field 'identify' specified in initializer
drivers/block/nvme-core.c:1164: warning: excess elements in struct initializer
drivers/block/nvme-core.c:1164: warning: (near initialization for 'c')
...

This patch has no effect on text size with gcc-4.8.2.

Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

e44ac588

NVMe: Fix filesystem deadlock on removal · 3399a3f7

由 Keith Busch 提交于 6月 18, 2015

Move gendisk deletion before controller shutdown so filesystem may sync
dirty pages. Before, this would deadlock trying to allocate requests
on frozen queues that are about to be deleted.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3399a3f7

NVMe: Failed controller initialization fixes · de3eff2b

由 Keith Busch 提交于 6月 18, 2015

This fixes an infinite device reset loop that may occur on devices that
fail initialization. If the drive fails to become ready for any reason
that does not involve an admin command timeout, the probe task should
assume the drive is unavailable and remove it from the topology. In
the case an admin command times out during device probing, the driver's
existing reset action will handle removing the drive.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

de3eff2b

NVMe: Unify controller probe and resume · ffe7704d

由 Keith Busch 提交于 6月 08, 2015

This unifies probe and resume so they both may be scheduled in the same
way. This is necessary for error handling that may occur during device
initialization since the task to cleanup the device wouldn't be able to
run if it is blocked on device initialization.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ffe7704d

NVMe: Don't use fake status on cancelled command · 17188bb4

由 Keith Busch 提交于 6月 08, 2015

Synchronized commands do different things for timed out commands
vs. controller returned errors.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

17188bb4

NVMe: Fix device cleanup on initialization failure · 4af0e21c

由 Keith Busch 提交于 6月 08, 2015

Don't release block queue and tagging resoureces if the driver never
got them in the first place. This can happen if the controller fails to
become ready, if memory wasn't available to allocate a tagset or admin
queue, or if the resources were released as part of error recovery.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4af0e21c

26 6月, 2015 13 次提交

zram: check comp algorithm availability earlier · d93435c3

由 Sergey Senozhatsky 提交于 6月 25, 2015

Improvement idea by Marcin Jabrzyk.

comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:

  "zram: Cannot initialise %s compressing backend"

this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.

add algorithm availability check to comp_algorithm_store():

  echo lzz > /sys/block/zram0/comp_algorithm
  -bash: echo: write error: Invalid argument
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NMarcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d93435c3

zram: cut trailing newline in algorithm name · 4bbacd51

由 Sergey Senozhatsky 提交于 6月 25, 2015

Supplied sysfs values sometimes contain new-line symbols (echo vs.  echo
-n), which we also copy as a compression algorithm name.  it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols.  however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:

 zram: Cannot initialise LXZ
            compressing backend

cut trailing new-line, so the error string will look like

  zram: Cannot initialise LXZ compressing backend

we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bbacd51

zram: cosmetic zram_bvec_write() cleanup · 17162f41

由 Sergey Senozhatsky 提交于 6月 25, 2015

`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL.  remove `locked' and check `zstrm' instead.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17162f41

zram: add dynamic device add/remove functionality · 6566d1a3

由 Sergey Senozhatsky 提交于 6月 25, 2015

We currently don't support on-demand device creation.  The one and only
way to have N zram devices is to specify num_devices module parameter
(default value: 1).  IOW if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1.  And do
this again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- hot_add      -- add a new zram device
- hot_remove   -- remove a specific (device_id) zram device

hot_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim).  read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/hot_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/hot_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/hot_add
	cat: /sys/class/zram-control/hot_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

[minchan@kernel.org: use zram->claim to avoid lockdep splat]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6566d1a3

zram: close race by open overriding · f405c445

由 Sergey Senozhatsky 提交于 6月 25, 2015

[ Original patch from Minchan Kim <minchan@kernel.org> ]

Commit ba6b17d6 ("zram: fix umount-reset_store-mount race
condition") introduced bdev->bd_mutex to protect a race between mount
and reset.  At that time, we don't have dynamic zram-add/remove feature
so it was okay.

However, as we introduce dynamic device feature, bd_mutex became
trouble.

	CPU 0

echo 1 > /sys/block/zram<id>/reset
  -> kernfs->s_active(A)
    -> zram:reset_store->bd_mutex(B)

	CPU 1

echo <id> > /sys/class/zram/zram-remove
  ->zram:zram_remove: bd_mutex(B)
  -> sysfs_remove_group
    -> kernfs->s_active(A)

IOW, AB -> BA deadlock

The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.

To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.

This patch is to prepare for zram-add/remove feature.

[sergey.senozhatsky@gmail.com: simplify reset_store()]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f405c445

zram: return zram device_id from zram_add() · 92ff1528

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92ff1528

zram: trivial: correct flag operations comment · b31177f2

由 Sergey Senozhatsky 提交于 6月 25, 2015

We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b31177f2

zram: report every added and removed device · d12b63c9

由 Sergey Senozhatsky 提交于 6月 25, 2015

With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices().  Print per-device action (added/removed) in zram_add()
and zram_remove() instead.

Example:

[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d12b63c9

zram: remove max_num_devices limitation · c3cdb40e

由 Sergey Senozhatsky 提交于 6月 25, 2015

Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it.  A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3cdb40e

zram: reorganize code layout · 522698d7

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch looks big, but basically it just moves code blocks.
No functional changes.

Our current code layout looks like a sandwitch.

For example,
a) between read/write handlers, we have update_used_max() helper function:

static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw

b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:

static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request

c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.

Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):

-- one-liners: zram_test_flag/etc.

-- helpers: is_partial_io/update_position/etc

-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.

-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page

-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page

-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

522698d7

zram: use idr instead of `zram_devices' array · 85508ec6

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch makes some preparations for on-demand device add/remove
functionality.

Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e.  list_head,
yet still provides ability to match the device_id with the device pointer.

No user-space visible changes.

[Julia.Lawall@lip6.fr: return -ENOMEM when `queue' alloc fails]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85508ec6

zram: cosmetic ZRAM_ATTR_RO code formatting tweak · 3bca3ef7

由 Sergey Senozhatsky 提交于 6月 25, 2015

Fix a misplaced backslash.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3bca3ef7

zram: remove obsolete ZRAM_DEBUG option · 9e65bf68

由 Marcin Jabrzyk 提交于 6月 25, 2015

This config option doesn't provide any usage for zram.
Signed-off-by: NMarcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e65bf68

25 6月, 2015 8 次提交

rbd: queue_depth map option · b5584180

由 Ilya Dryomov 提交于 6月 23, 2015

nr_requests (/sys/block/rbd<id>/queue/nr_requests) is pretty much
irrelevant in blk-mq case because each driver sets its own max depth
that it can handle and that's the number of tags that gets preallocated
on setup.  Users can't increase queue depth beyond that value via
writing to nr_requests.

For rbd we are happy with the default BLKDEV_MAX_RQ (128) for most
cases but we want to give users the opportunity to increase it.
Introduce a new per-device queue_depth option to do just that:

    $ sudo rbd map -o queue_depth=1024 ...
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b5584180

rbd: store rbd_options in rbd_device · d147543d

由 Ilya Dryomov 提交于 6月 22, 2015

Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d147543d

rbd: terminate rbd_opts_tokens with Opt_err · 210c104c

由 Ilya Dryomov 提交于 6月 22, 2015

Also nuke useless Opt_last_bool and don't break lines unnecessarily.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

210c104c

rbd: bump queue_max_segments · d3834fef

由 Ilya Dryomov 提交于 6月 12, 2015

The default queue_limits::max_segments value (BLK_MAX_SEGMENTS = 128)
unnecessarily limits bio sizes to 512k (assuming 4k pages).  rbd, being
a virtual block device, doesn't have any restrictions on the number of
physical segments, so bump max_segments to max_hw_sectors, in theory
allowing a sector per segment (although the only case this matters that
I can think of is some readv/writev style thing).  In practice this is
going to give us 1M bios - the number of segments in a bio is limited
in bio_get_nr_vecs() by BIO_MAX_PAGES = 256.

Note that this doesn't result in any improvement on a typical direct
sequential test.  This is because on a box with a not too badly
fragmented memory the default BLK_MAX_SEGMENTS is enough to see nice
rbd object size sized requests.  The only difference is the size of
bios being merged - 512k vs 1M for something like

    $ dd if=/dev/zero of=/dev/rbd0 oflag=direct bs=$RBD_OBJ_SIZE
    $ dd if=/dev/rbd0 iflag=direct of=/dev/null bs=$RBD_OBJ_SIZE
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d3834fef

rbd: timeout watch teardown on unmap with mount_timeout · 2894e1d7

由 Ilya Dryomov 提交于 5月 12, 2015

As part of unmap sequence, kernel client has to talk to the OSDs to
teardown watch on the header object. If none of the OSDs are available
it would hang forever, until interrupted by a signal - when that
happens we follow through with the rest of unmap procedure (i.e.
unregister the device and put all the data structures) and the unmap is
still considired successful (rbd cli tool exits with 0). The watch on
the userspace side should eventually timeout so that's fine.

This isn't very nice, because various userspace tools (pacemaker rbd
resource agent, for example) then have to worry about setting up their
own timeouts. Timeout it with mount_timeout (60 seconds by default).
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NSage Weil <sage@redhat.com>

2894e1d7

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14

libnvdimm, pmem: move pmem to drivers/nvdimm/ · 18da2c9e

由 Dan Williams 提交于 6月 09, 2015

Prepare the pmem driver to consume PMEM namespaces emitted by regions of
an nvdimm_bus instance.  No functional change.
Acked-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NToshi Kani <toshi.kani@hp.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

18da2c9e

24 6月, 2015 3 次提交

mtip32xx: Fix accessing freed memory · 98f57c51

由 Selvan Mani 提交于 6月 24, 2015

In mtip_pci_remove(), driver data 'dd' is accessed after freeing it. This
is a residue of SRSI code cleanup in the patch 016a41c38821 "mtip32xx: fix
crash on surprise removal of the drive". Removed the bit flags
MTIP_DDF_REMOVE_DONE_BIT and MTIP_PF_SR_CLEANUP_BIT.
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NVignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: NSelvan Mani <smani@micron.com>
Signed-off-by: NAsai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

98f57c51

A
make simple_positive() public · dc3f4198
由 Al Viro 提交于 5月 18, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
dc3f4198

vfs: add file_path() helper · 9bf39ab2

由 Miklos Szeredi 提交于 6月 19, 2015

Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9bf39ab2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功