提交 · a5bbf6160c2b5ebce1533bf989c7cd0086beeabf · openanolis / cloud-kernel

09 8月, 2014 1 次提交

block: use pci_zalloc_consistent · a5bbf616

由 Joe Perches 提交于 8月 08, 2014

Remove the now unnecessary memset too.
Signed-off-by: NJoe Perches <joe@perches.com>
Mike Miller <mike.miller@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5bbf616

07 8月, 2014 4 次提交

zram: replace global tb_lock with fine grain lock · d2d5e762

由 Weijie Yang 提交于 8月 06, 2014

Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table.  However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.

The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.

With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:

Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.

iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)

      Test       base      CAS    spinlock    rwlock   bit_spinlock
-------------------------------------------------------------------
 Initial write  1381094   1425435   1422860   1423075   1421521
       Rewrite  1529479   1641199   1668762   1672855   1654910
          Read  8468009  11324979  11305569  11117273  10997202
       Re-read  8467476  11260914  11248059  11145336  10906486
  Reverse Read  6821393   8106334   8282174   8279195   8109186
   Stride read  7191093   8994306   9153982   8961224   9004434
   Random read  7156353   8957932   9167098   8980465   8940476
Mixed workload  4172747   5680814   5927825   5489578   5972253
  Random write  1483044   1605588   1594329   1600453   1596010
        Pwrite  1276644   1303108   1311612   1314228   1300960
         Pread  4324337   4632869   4618386   4457870   4500166

To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.

fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)

    Test     base     CAS    spinlock    rwlock  bit_spinlock
-------------------------------------------------------------
seq-write   933789   999357   1003298    995961   1001958
 seq-read  5634130  6577930   6380861   6243912   6230006
   seq-rw  1405687  1638117   1640256   1633903   1634459
  rand-rw  1386119  1614664   1617211   1609267   1612471

All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.

On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.

This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.

On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly.  So we can ignore the different performances
among locks.
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2d5e762

zram: use size_t instead of u16 · 023b409f

由 Minchan Kim 提交于 8月 06, 2014

Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16
or more.  In these cases u16 is not sufficiently large to represent a
compressed page's size so use size_t.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NWeijie Yang <weijie.yang@samsung.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

023b409f

zram: remove unused SECTOR_SIZE define · a830eff7

由 Sergey Senozhatsky 提交于 8月 06, 2014

Drop SECTOR_SIZE define, because it's not used.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a830eff7

zram: rename struct `table' to `zram_table_entry' · cb8f2eec

由 Sergey Senozhatsky 提交于 8月 06, 2014

Andrew Morton has recently noted that `struct table' actually represents
table entry and, thus, should be renamed.  Rename to `zram_table_entry'.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb8f2eec

24 7月, 2014 1 次提交

zram: avoid lockdep splat by revalidate_disk · b4c5c609

由 Minchan Kim 提交于 7月 23, 2014

Sasha reported lockdep warning [1] introduced by [2].

It could be fixed by doing disk revalidation out of the init_lock.  It's
okay because disk capacity change is protected by init_lock so that
revalidate_disk always sees up-to-date value so there is no race.

[1] https://lkml.org/lkml/2014/7/3/735
[2] zram: revalidate disk after capacity change

Fixes 2e32baea ("zram: revalidate disk after capacity change").
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4c5c609

10 7月, 2014 1 次提交

drbd: fix regression 'out of mem, failed to invoke fence-peer helper' · bbc1c5e8

由 Lars Ellenberg 提交于 7月 09, 2014

Since linux kernel 3.13, kthread_run() internally uses
wait_for_completion_killable().  We sometimes may use kthread_run()
while we still have a signal pending, which we used to kick our threads
out of potentially blocking network functions, causing kthread_run() to
mistake that as a new fatal signal and fail.

Fix: flush_signals() before kthread_run().
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

bbc1c5e8

04 7月, 2014 1 次提交

zram: revalidate disk after capacity change · 2e32baea

由 Minchan Kim 提交于 7月 02, 2014

Alexander reported mkswap on /dev/zram0 is failed if other process is
opening the block device file.

Step is as follows,

0. Reset the unused zram device.
1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
   until killed.
2. While that program sleeps, echo the correct value to
   /sys/block/zram0/disksize.
3. Verify (e.g. in /proc/partitions) that the disk size is applied
   correctly. It is.
4. While that program still sleeps, attempt to mkswap /dev/zram0.
   This fails: mkswap: error: swap area needs to be at least 40 KiB

When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
mkswap to get a size of blockdev was zero although zram0 has right size by
2.

The reason is zram didn't revalidate disk after changing capacity so that
size of blockdev's inode is not uptodate until all of file is close.

This patch should fix the BUG.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NAlexander E. Patrakov <patrakov@gmail.com>
Tested-by: NAlexander E. Patrakov <patrakov@gmail.com>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e32baea

25 6月, 2014 1 次提交

drbd: fix NULL pointer deref in blk_add_request_payload · 54ed4ed8

由 Lars Ellenberg 提交于 6月 25, 2014

Discards don't have any payload.
But the scsi layer still expects a bio_vec it can use internally,
see sd_setup_discard_cmnd() and blk_add_request_payload().
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

54ed4ed8

23 6月, 2014 1 次提交

rbd: handle parent_overlap on writes correctly · 9638556a

由 Ilya Dryomov 提交于 6月 10, 2014

The following check in rbd_img_obj_request_submit()

    rbd_dev->parent_overlap <= obj_request->img_offset

allows the fall through to the non-layered write case even if both
parent_overlap and obj_request->img_offset belong to the same RADOS
object.  This leads to data corruption, because the area to the left of
parent_overlap ends up unconditionally zero-filled instead of being
populated with parent data.  Suppose we want to write 1M to offset 6M
of image bar, which is a clone of foo@snap; object_size is 4M,
parent_overlap is 5M:

    rbd_data.<id>.0000000000000001
     ---------------------|----------------------|------------
    | should be copyup'ed | should be zeroed out | write ...
     ---------------------|----------------------|------------
   4M                    5M                     6M
                    parent_overlap    obj_request->img_offset

4..5M should be copyup'ed from foo, yet it is zero-filled, just like
5..6M is.

Given that the only striping mode kernel client currently supports is
chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
parent_overlap up to the next object boundary for the purposes of the
overlap check.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9638556a

18 6月, 2014 1 次提交

floppy: format block0 read error message properly · 1c65df3d

由 Jiri Kosina 提交于 6月 18, 2014

In case reading of block 0 fails, line without trailing newline
is printed causing dmesg to look horrible.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

1c65df3d

17 6月, 2014 1 次提交

null_blk: fix softirq completions for queue_mode == 1 · d891fa70

由 Jens Axboe 提交于 6月 16, 2014

Only blk-mq completions have payload attached to the request, for
request_fn mode we have stored it in req->special. This fixes an
oops with queue_mode=1 and softirq completions.
Signed-off-by: NJens Axboe <axboe@fb.com>

d891fa70

14 6月, 2014 1 次提交

NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. · b8e08084

由 Dan McLeran 提交于 6月 06, 2014

This patch contains several fixes for Scsi START_STOP_UNIT. The previous
code did not account for signed vs. unsigned arithmetic which resulted
in an invalid lowest power state caculation when the device only supports
1 power state.

The code for Power Condition == 2 (Idle) was not following the spec. The
spec calls for setting the device to specific power states, depending
upon Power Condition Modifier, without accounting for the number of
power states supported by the device.

The code for Power Condition == 3 (Standby) was using a hard-coded '0'
which is replaced with the macro POWER_STATE_0.
Signed-off-by: NDan McLeran <daniel.mcleran@intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b8e08084

13 6月, 2014 2 次提交

NVMe: Use Log Page constants in SCSI emulation · ef351b97

由 Matthew Wilcox 提交于 6月 13, 2014

The nvme-scsi file defined its own Log Page constant.  Use the
newly-defined one from the header file instead.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

ef351b97

NVMe: Fix hot cpu notification dead lock · f3db22fe

由 Keith Busch 提交于 6月 11, 2014

There is a potential dead lock if a cpu event occurs during nvme probe
since it registered with hot cpu notification. This fixes the race by
having the module register with notification outside of probe rather
than have each device register.

The actual work is done in a scheduled work queue instead of in the
notifier since assigning IO queues has the potential to block if the
driver creates additional queues.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

f3db22fe

12 6月, 2014 1 次提交

null_blk: fix name and description of 'queue_mode' module parameter · 54ae81cd

由 Mike Snitzer 提交于 6月 11, 2014

'use_mq' is not the name of the module parameter, 'queue_mode' is.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

54ae81cd

11 6月, 2014 3 次提交

rbd: only set disk to read-only once · 22001f61

由 Josh Durgin 提交于 9月 30, 2013

rbd_open(), called every time the device is opened, calls
set_device_ro().  There's no reason to set the device read-only or
read-write every time it is opened. Just do this once during device
setup, using set_disk_ro() instead because the struct block_device
isn't available to us there.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

22001f61

rbd: move calls that may sleep out of spin lock range · 77f33c03

由 Josh Durgin 提交于 9月 30, 2013

get_user() and set_disk_ro() may allocate memory, leading to a
potential deadlock if theye are called while a spin lock is held.

Move the acquisition and release of rbd_dev->lock from rbd_ioctl()
into rbd_ioctl_set_ro(), so it can occur between get_user() and
set_disk_ro().
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

77f33c03

rbd: add ioctl for rbd · 131fd9f6

由 Guangliang Zhao 提交于 9月 24, 2013

When running the following commands:
    [root@ceph0 mnt]# blockdev --setro /dev/rbd1
    [root@ceph0 mnt]# blockdev --getro /dev/rbd1
    0

The block setro didn't take effect, it is because
the rbd doesn't support ioctl of block driver.

This resolves:
	http://tracker.ceph.com/issues/6265Signed-off-by: NGuangliang Zhao <guangliang@unitedstack.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

131fd9f6

07 6月, 2014 2 次提交

nbd: zero from and len fields in NBD_CMD_DISCONNECT. · 04cfac4e

由 Hani Benhabiles 提交于 6月 06, 2014

Len field is already set to zero, but not the from field which is sent
as 0xfffffffffffffe00.  This makes no sense, and may cause confuse
server implementations doing sanity checks (qemu-nbd is an example.)
Signed-off-by: NHani Benhabiles <hani@linux.com>
Cc: Paul Clements <paul.clements@us.sios.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04cfac4e

mtip32xx: minor performance enhancements · f45c40a9

由 Sam Bradshaw 提交于 6月 06, 2014

This patch adds the following:

1) Compiler hinting in the fast path.
2) A prefetch of port->flags to eliminate moderate cpu stalling later
in mtip_hw_submit_io().
3) Eliminate a redundant rq_data_dir().
4) Reorder members of driver_data to eliminate false cacheline sharing
between irq_workers_active and unal_qdepth.

With some workload and topology configurations, I'm seeing ~1.5%
throughput improvement in small block random read benchmarks as well
as improved latency std. dev.
Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>

Add include of <linux/prefetch.h>
Signed-off-by: NJens Axboe <axboe@fb.com>

f45c40a9

06 6月, 2014 6 次提交

block: add blk_rq_set_block_pc() · f27b087b

由 Jens Axboe 提交于 6月 06, 2014

With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.

Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.

Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
attempt.
Signed-off-by: NJens Axboe <axboe@fb.com>

f27b087b

rbd: fix ida/idr memory leak · ffe312cf

由 Ilya Dryomov 提交于 5月 20, 2014

ida_destroy() needs to be called on module exit to release ida caches.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

ffe312cf

rbd: use reference counts for image requests · 0f2d5be7

由 Alex Elder 提交于 4月 26, 2014

Each image request contains a reference count, but to date it has
not actually been used.  (I think this was just an oversight.) A
recent report involving rbd failing an assertion shed light on why
and where we need to use these reference counts.

Every OSD request associated with an object request uses
rbd_osd_req_callback() as its callback function.  That function will
call a helper function (dependent on the type of OSD request) that
will set the object request's "done" flag if the object request if
appropriate.  If that "done" flag is set, the object request is
passed to rbd_obj_request_complete().

In rbd_obj_request_complete(), requests are processed in sequential
order.  So if an object request completes before one of its
predecessors in the image request, the completion is deferred.
Otherwise, if it's a completing object's "turn" to be completed, it
is passed to rbd_img_obj_end_request(), which records the result of
the operation, accumulates transferred bytes, and so on.  Next, the
successor to this request is checked and if it is marked "done",
(deferred) completion processing is performed on that request, and
so on.  If the last object request in an image request is completed,
rbd_img_request_complete() is called, which (typically) destroys
the image request.

There is a race here, however.  The instant an object request is
marked "done" it can be provided (by a thread handling completion of
one of its predecessor operations) to rbd_img_obj_end_request(),
which (for the last request) can then lead to the image request
getting torn down.  And this can happen *before* that object has
itself entered rbd_img_obj_end_request().  As a result, once it
*does* enter that function, the image request (and even the object
request itself) may have been freed and become invalid.

All that's necessary to avoid this is to properly count references
to the image requests.  We tear down an image request's object
requests all at once--only when the entire image request has
completed.  So there's no need for an image request to count
references for its object requests.  However, we don't want an
image request to go away until the last of its object requests
has passed through rbd_img_obj_callback().  In other words,
we don't want rbd_img_request_complete() to necessarily
result in the image request being destroyed, because it may
get called before we've finished processing on all of its
object requests.

So the fix is to add a reference to an image request for
each of its object requests.  The reference can be viewed
as representing an object request that has not yet finished
its call to rbd_img_obj_callback().  That is emphasized by
getting the reference right after assigning that as the image
object's callback function.  The corresponding release of that
reference is done at the end of rbd_img_obj_callback(), which
every image object request passes through exactly once.

Cc: stable@vger.kernel.org
Signed-off-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NIlya Dryomov <ilya.dryomov@inktank.com>

0f2d5be7

rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() · b30a01f2

由 Ilya Dryomov 提交于 5月 22, 2014

osd_request, along with r_request and r_reply messages attached to it
are leaked in __rbd_dev_header_watch_sync() if the requested image
doesn't exist. This is because lingering requests are special and get
an extra ref in the reply path. Fix it by unregistering linger request
on the error path and split __rbd_dev_header_watch_sync() into two
functions to make it maintainable.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

b30a01f2

rbd: make sure we have latest osdmap on 'rbd map' · 30ba1f02

由 Ilya Dryomov 提交于 5月 13, 2014

Given an existing idle mapping (img1), mapping an image (img2) in
a newly created pool (pool2) fails:

    $ ceph osd pool create pool1 8 8
    $ rbd create --size 1000 pool1/img1
    $ sudo rbd map pool1/img1
    $ ceph osd pool create pool2 8 8
    $ rbd create --size 1000 pool2/img2
    $ sudo rbd map pool2/img2
    rbd: sysfs write failed
    rbd: map failed: (2) No such file or directory

This is because client instances are shared by default and we don't
request an osdmap update when bumping a ref on an existing client.  The
fix is to use the mon_get_version request to see if the osdmap we have
is the latest, and block until the requested update is received if it's
not.

Fixes: http://tracker.ceph.com/issues/8184Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

30ba1f02

rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · 461f758a

由 Duan Jiong 提交于 4月 11, 2014

This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

461f758a

05 6月, 2014 4 次提交

zram: correct offset usage in zram_bio_discard · 38515c73

由 Weijie Yang 提交于 6月 04, 2014

We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.

The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown.  Consider the following scenario:

On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks.  However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.

This patch corrects the offset usage in zram_bio_discard.
Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38515c73

brd: return -ENOSPC rather than -ENOMEM on page allocation failure · 96f8d8e0

由 Matthew Wilcox 提交于 6月 04, 2014

brd is effectively a thinly provisioned device.  Thinly provisioned
devices return -ENOSPC when they can't write a new block.  -ENOMEM is an
implementation detail that callers shouldn't know.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: NDave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96f8d8e0

brd: add support for rw_page() · a72132c3

由 Matthew Wilcox 提交于 6月 04, 2014

Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a72132c3

blk-mq: let blk_mq_tag_to_rq() take blk_mq_tags as the main parameter · 0e62f51f

由 Jens Axboe 提交于 6月 04, 2014

We currently pass in the hardware queue, and get the tags from there.
But from scsi-mq, with a shared tag space, it's a lot more convenient
to pass in the blk_mq_tags instead as the hardware queue isn't always
directly available. So instead of having to re-map to a given
hardware queue from rq->mq_ctx, just pass in the tags structure.
Signed-off-by: NJens Axboe <axboe@fb.com>

0e62f51f

04 6月, 2014 9 次提交

NVMe: Rename io_timeout to nvme_io_timeout · bd67608a

由 Matthew Wilcox 提交于 6月 03, 2014

It's positively immoral to have a global variable called 'io_timeout'.
Keep the module parameter called io_timeout, though.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

bd67608a

NVMe: Use last bytes of f/w rev SCSI Inquiry · dedf4b15

由 Keith Busch 提交于 4月 29, 2014

After skipping right-padded spaces, use the last four bytes of the
firmware revision when reporting the Inquiry Product Revision. These
are generally more indicative to what is running.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Acked-by: NVishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

dedf4b15

NVMe: Adhere to request queue block accounting enable/disable · b4e75cbf

由 Sam Bradshaw 提交于 5月 09, 2014

Recently, a new sysfs control "iostats" was added to selectively
enable or disable io statistics collection for request queues.  This
patch hooks that control.

IO statistics collection is rather expensive on large, multi-node
machines with drives pushing millions of iops.  Having the ability to
disable collection if not needed can improve throughput significantly.

As a data point, on a quad E5-4640, I see more than 50% throughput
improvement when io statistics accounting is disabled during heavily
multi-threaded small block random read benchmarks where device
performance is in the million iops+ range.
Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b4e75cbf

NVMe: Fix nvme get/put queue semantics · a51afb54

由 Keith Busch 提交于 5月 13, 2014

The routines to get and lock nvme queues required the caller to "put"
or "unlock" them even if getting one returned NULL. This patch fixes that.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

a51afb54

NVMe: Delete NVME_GET_FEAT_TEMP_THRESH · de672b97

由 Matthew Wilcox 提交于 6月 03, 2014

This define isn't used, and any code that wanted to use it should use
NVME_FEAT_TEMP_THRESH instead.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

de672b97

NVMe: Make admin timeout a module parameter · 9d43cf64

由 Keith Busch 提交于 5月 13, 2014

Signed-off-by: NKeith Busch <keith.busch@intel.com>
[made admin_timeout static]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

9d43cf64

NVMe: Make iod bio timeout a parameter · 61e4ce08

由 Keith Busch 提交于 5月 13, 2014

This was originally set to 4 times the IO timeout, but that was when
the IO timeout was 5 seconds instead of 30. 20 seconds for total time
to failure seemed more reasonable than 2 minutes for most, but other
users have requested to make this a module parameter instead.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[renamed the module parameter to retry_time]
[made retry_time static]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

61e4ce08

NVMe: Prevent possible NULL pointer dereference · 6808c5fb

由 Santosh Y 提交于 5月 29, 2014

kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod'
can fail. So check the return value.
Signed-off-by: NSantosh Y <santosh.sy@samsung.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

6808c5fb

NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) · 4131f2fc

由 Indraneel Mukherjee 提交于 5月 29, 2014

In GetLogPage the buffer size passed to device is a 0's based value.
Signed-off-by: NIndraneel M <indraneel.m@samsung.com>
Reported-by: NShiro Itou <shiro.itou@outlook.com>
Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

4131f2fc

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功