提交 · 95afae481414cbdb0567bf82d5e5077c3ac9da20 · openanolis / cloud-kernel

06 10月, 2014 1 次提交

xen: remove DEFINE_XENBUS_DRIVER() macro · 95afae48

由 David Vrabel 提交于 9月 08, 2014

The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse
errors.

Replace the uses with standard structure definitions instead.  This is
similar to pci and usb device registration.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

95afae48

10 9月, 2014 2 次提交

rbd: fix error return code in rbd_dev_device_setup() · 255939e7

由 Wei Yongjun 提交于 8月 13, 2014

Fix to return -ENOMEM from the workqueue alloc error handling
case instead of 0, as done elsewhere in this function.
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>

255939e7

rbd: avoid format-security warning inside alloc_workqueue() · 58d1362b

由 Ilya Dryomov 提交于 8月 12, 2014

drivers/block/rbd.c: In function ‘rbd_dev_device_setup’:
drivers/block/rbd.c:5090:19: warning: format not a string literal and no format arguments [-Wformat-security]
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

58d1362b

03 9月, 2014 1 次提交

blk-mq: pass along blk_mq_alloc_tag_set return values · dc501dc0

由 Robert Elliott 提交于 9月 02, 2014

Two of the blk-mq based drivers do not pass back the return value
from blk_mq_alloc_tag_set, instead just returning -ENOMEM.

blk_mq_alloc_tag_set returns -EINVAL if the number of queues or
queue depth is bad.  -ENOMEM implies that retrying after freeing some
memory might be more successful, but that won't ever change
in the -EINVAL cases.

Change the null_blk and mtip32xx drivers to pass along
the return value.
Signed-off-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dc501dc0

30 8月, 2014 1 次提交

zram: fix incorrect stat with failed_reads · 0cf1e9d6

由 Chao Yu 提交于 8月 29, 2014

Since we allocate a temporary buffer in zram_bvec_read to handle partial
page operations in commit 924bd88d ("Staging: zram: allow partial
page operations"), our ->failed_reads value may be incorrect as we do
not increase its value when failing to allocate the temporary buffer.

Let's fix this issue and correct the annotation of failed_reads.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0cf1e9d6

22 8月, 2014 2 次提交

brd: add ram disk visibility option · aeac3181

由 Dmitry Monakhov 提交于 8月 18, 2014

Currenly ram disk is not visiable inside /proc/partitions. This was
done for compatibility reasons here: 53978d0a. But some utilities
expect disk presents in /proc/partitions.
Let's add module's option and let's administrator chose visibility behaviour.
By default, old behaviour preserved.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

aeac3181

block: systemace: Remove .owner field for driver · ffb5db73

由 Michal Simek 提交于 8月 13, 2014

There is no need to init .owner field.

Based on the patch from Peter Griffin <peter.griffin@linaro.org>
"mmc: remove .owner field for drivers using module_platform_driver"

This patch removes the superflous .owner field for drivers which
use the module_platform_driver API, as this is overriden in
platform_driver_register anyway."
Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ffb5db73

13 8月, 2014 1 次提交

PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use · 9baa3c34

由 Benoit Taine 提交于 8月 08, 2014

We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines.  This issue was reported by checkpatch.

A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@

- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;

// </smpl>

[bhelgaas: add semantic patch]
Signed-off-by: NBenoit Taine <benoit.taine@lip6.fr>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

9baa3c34

09 8月, 2014 1 次提交

block: use pci_zalloc_consistent · a5bbf616

由 Joe Perches 提交于 8月 08, 2014

Remove the now unnecessary memset too.
Signed-off-by: NJoe Perches <joe@perches.com>
Mike Miller <mike.miller@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5bbf616

07 8月, 2014 7 次提交

rbd: remove extra newlines from rbd_warn() messages · 9584d508

由 Ilya Dryomov 提交于 7月 11, 2014

rbd_warn() string should be a single line - rbd_warn() appends \n.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

9584d508

rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC · 7a716aac

由 Ilya Dryomov 提交于 8月 05, 2014

Now that rbd_img_request_create() is called from work functions, no
need to use GFP_ATOMIC.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7a716aac

rbd: rework rbd_request_fn() · bc1ecc65

由 Ilya Dryomov 提交于 8月 04, 2014

While it was never a good idea to sleep in request_fn(), commit
34c6bc2c ("locking/mutexes: Add extra reschedule point") made it
a *bad* idea.  mutex_lock() since 3.15 may reschedule *before* putting
task on the mutex wait queue, which for tasks in !TASK_RUNNING state
means block forever.  request_fn() may be called with !TASK_RUNNING on
the way to schedule() in io_schedule().

Offload request handling to a workqueue, one per rbd device, to avoid
calling blocking primitives from rbd_request_fn().

Fixes: http://tracker.ceph.com/issues/8818

Cc: stable@vger.kernel.org # 3.16, needs backporting for 3.15
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Tested-by: NEric Eastman <eric0e@aol.com>
Tested-by: NGreg Wilson <greg.wilson@keepertech.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

bc1ecc65

zram: replace global tb_lock with fine grain lock · d2d5e762

由 Weijie Yang 提交于 8月 06, 2014

Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table.  However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.

The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.

With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:

Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.

iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)

      Test       base      CAS    spinlock    rwlock   bit_spinlock
-------------------------------------------------------------------
 Initial write  1381094   1425435   1422860   1423075   1421521
       Rewrite  1529479   1641199   1668762   1672855   1654910
          Read  8468009  11324979  11305569  11117273  10997202
       Re-read  8467476  11260914  11248059  11145336  10906486
  Reverse Read  6821393   8106334   8282174   8279195   8109186
   Stride read  7191093   8994306   9153982   8961224   9004434
   Random read  7156353   8957932   9167098   8980465   8940476
Mixed workload  4172747   5680814   5927825   5489578   5972253
  Random write  1483044   1605588   1594329   1600453   1596010
        Pwrite  1276644   1303108   1311612   1314228   1300960
         Pread  4324337   4632869   4618386   4457870   4500166

To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.

fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)

    Test     base     CAS    spinlock    rwlock  bit_spinlock
-------------------------------------------------------------
seq-write   933789   999357   1003298    995961   1001958
 seq-read  5634130  6577930   6380861   6243912   6230006
   seq-rw  1405687  1638117   1640256   1633903   1634459
  rand-rw  1386119  1614664   1617211   1609267   1612471

All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.

On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.

This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.

On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly.  So we can ignore the different performances
among locks.
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2d5e762

zram: use size_t instead of u16 · 023b409f

由 Minchan Kim 提交于 8月 06, 2014

Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16
or more.  In these cases u16 is not sufficiently large to represent a
compressed page's size so use size_t.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NWeijie Yang <weijie.yang@samsung.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

023b409f

zram: remove unused SECTOR_SIZE define · a830eff7

由 Sergey Senozhatsky 提交于 8月 06, 2014

Drop SECTOR_SIZE define, because it's not used.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a830eff7

zram: rename struct `table' to `zram_table_entry' · cb8f2eec

由 Sergey Senozhatsky 提交于 8月 06, 2014

Andrew Morton has recently noted that `struct table' actually represents
table entry and, thus, should be renamed.  Rename to `zram_table_entry'.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb8f2eec

25 7月, 2014 8 次提交

rbd: take snap_id into account when reading in parent info · 4d9b67cd

由 Ilya Dryomov 提交于 7月 24, 2014

If we are mapping a snapshot, we must read in the parent_overlap value
of that snapshot instead of that of the base image.  Not doing so may
in particular result in us returning zeros instead of user data:

    # cat overlap-snap.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
    echo "Base image"
    dd if=$FOO_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    rbd snap create bar@snap
    BAR_DEV=$(rbd map bar@snap)
    echo "Snapshot"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd resize --allow-shrink --size 4 bar
    echo "Snapshot after base image resize"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd

    # ./overlap-snap.sh
    Base image
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Snapshot
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Resizing image: 100% complete...done.
    Snapshot after base image resize
    0000000: e781 e33b d34b 2225 0000 0000 0000 0000  ...;.K"%........

Even though bar@snap is taken with the old bar parent_overlap (8M),
reads from bar@snap beyond the new bar parent_overlap (4M) return
zeroes.  Fix it.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

4d9b67cd

rbd: do not read in parent info before snap context · e8f59b59

由 Ilya Dryomov 提交于 7月 24, 2014

Currently rbd_dev_v2_header_info() reads in parent info before the snap
context is read in. This is wrong, because we may need to look at the
the parent_overlap value of the snapshot instead of that of the base
image, for example when mapping a snapshot - see next commit. (When
mapping a snapshot, all we got is its name and we need the snap context
to translate that name into an id to know which parent info to look
for.)

The approach taken here is to make sure rbd_dev_v2_parent_info() is
called after the snap context has been read in. The other approach
would be to add a parent_overlap field to struct rbd_mapping and
maintain it the same way rbd_mapping::size is maintained. The reason
I chose the first approach is that the value of keeping around both
base image values and the actual mapping values is unclear to me.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e8f59b59

rbd: update mapping size only on refresh · 5ff1108c