提交 · 28bc3b8c71cda033a4c013131c635d1148889824 · openeuler / Kernel

26 11月, 2015 6 次提交

drbd: Fix locking across all resources · 28bc3b8c

由 Andreas Gruenbacher 提交于 8月 14, 2014

Instead of using a rwlock for synchronizing state changes across
resources, take the request locks of all resources for global state
changes.  Use resources_mutex to serialize global state changes.

This means that taking the request lock of a resource is now enough to
prevent changes of that resource.  (Previously, a read lock on the
global state lock was needed as well.)
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

28bc3b8c

drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed() · 1ec317d3

由 Andreas Gruenbacher 提交于 8月 14, 2014

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1ec317d3

drbd: Move enum write_ordering_e to drbd.h · f6ba8636

由 Andreas Gruenbacher 提交于 8月 13, 2014

Also change the enum values to all-capital letters.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6ba8636

drbd: Get rid of some first_peer_device() calls · 5dd2ca19

由 Andreas Gruenbacher 提交于 8月 11, 2014

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5dd2ca19

drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync() · 2e9ffde6

由 Andreas Gruenbacher 提交于 8月 08, 2014

There is no need to have these two as inline functions.  In addition,
drbd_should_send_out_of_sync() is only used in a single place, anyway.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2e9ffde6

drbd: Remove pointless check · 3b8a44f8

由 Philipp Reisner 提交于 10月 29, 2014

In drbd-8.4 there is always a single connection per resource,
and there is always exactly one peer_device for a device.
peer_device can not be NULL here.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3b8a44f8

20 11月, 2015 4 次提交

mtip32xx: use formatting capability of kthread_create_on_node · 8aeea031

由 Rasmus Villemoes 提交于 11月 20, 2015

kthread_create_on_node takes format+args, so there's no need to do the
pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its
'\0') only just fits in 16 bytes, so if index could ever go above 99
we'd have a stack buffer overflow.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8aeea031

null_blk: do not del gendisk with lightnvm · 54514aa4

由 Matias Bjørling 提交于 11月 19, 2015

The gendisk structure has not been initialized when using lightnvm.
Make sure to not delete it upon exit. Also make sure that we use the
appropriate disk_name at unregistration.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

54514aa4

null_blk: use device addressing mode · 5b40db99

由 Matias Bjørling 提交于 11月 19, 2015

The linear addressing mode was removed in 7386af27. Make null_blk instead
expose the ppa format geometry and support the generic addressing mode.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

5b40db99

null_blk: use ppa_cache pool · 6bb9535b

由 Matias Bjørling 提交于 11月 19, 2015

Instead of using a page pool, we can save memory by only allocating room
for 64 entries for the ppa command. Introduce a ppa_cache to allocate only
the required memory for the ppa list.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6bb9535b

17 11月, 2015 1 次提交

null_blk: register as a LightNVM device · b2b7e001

由 Matias Bjørling 提交于 11月 12, 2015

Add support for registering as a LightNVM device. This allows us to
evaluate the performance of the LightNVM subsystem.

In /drivers/Makefile, LightNVM is moved above block device drivers
to make sure that the LightNVM media managers have been initialized
before drivers under /drivers/block are initialized.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Fix by Jens Axboe to remove unneeded slab cache and the following
memory leak.
Signed-off-by: NJens Axboe <axboe@fb.com>

b2b7e001

12 11月, 2015 1 次提交

brd: Refuse improperly aligned discard requests · 2dbe5495

由 Jan Kara 提交于 11月 04, 2015

Currently when improperly aligned discard request is submitted, we just
silently discard more / less data which results in filesystem corruption
in some cases. Refuse such misaligned requests.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2dbe5495

08 11月, 2015 1 次提交

block: change ->make_request_fn() and users to return a queue cookie · dece1635

由 Jens Axboe 提交于 11月 05, 2015

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

dece1635

07 11月, 2015 6 次提交

signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29

由 Oleg Nesterov 提交于 11月 06, 2015

1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
   matches another "for kthreads only" kernel_sigaction() helper.

2. Remove the "tsk" and "mask" arguments, they are always current
   and current->blocked. And it is simply wrong if tsk != current.

3. We could also remove the 3rd "siginfo_t *info" arg but it looks
   potentially useful. However we can simplify the callers if we
   change kernel_dequeue_signal() to accept info => NULL.

4. Remove _irqsave, it is never called from atomic context.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NTejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be0e6f29

zram: make is_partial_io/valid_io_request/page_zero_filled return boolean · 1c53e0d2

由 Geliang Tang 提交于 11月 06, 2015

Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean,
since each function only uses either one or zero as its return value.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c53e0d2

zram: keep the exact overcommited value in mem_used_max · 12372755

由 Sergey SENOZHATSKY 提交于 11月 06, 2015

`mem_used_max' is designed to store the max amount of memory zram consumed
to store the data.  However, it does not represent the actual
'overcommited' (max) value.  The existing code goes to -ENOMEM
overcommited case before it updates `->stats.max_used_pages', which hides
the reason we went to -ENOMEM in the first place -- we actually used more
memory than `->limit_pages':

        alloced_pages = zs_get_total_pages(meta->mem_pool);
        if (zram->limit_pages && alloced_pages > zram->limit_pages) {
                zs_free(meta->mem_pool, handle);
                ret = -ENOMEM;
                goto out;
        }

        update_used_max(zram, alloced_pages);

Which is misleading.  User will see -ENOMEM, check `->limit_pages', check
`->stats.max_used_pages', which will keep the value BEFORE zram passed
`->limit_pages', and see:
	`->stats.max_used_pages' < `->limit_pages'

Move update_used_max() before we do `->limit_pages' check, so that
user will see:
	`->stats.max_used_pages' > `->limit_pages'
should the overcommit and -ENOMEM happen.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12372755

zram: introduce comp algorithm fallback functionality · 1d5b43bf

由 Luis Henriques 提交于 11月 06, 2015

When the user supplies an unsupported compression algorithm, keep the
previously selected one (knowingly supported) or the default one (if the
compression algorithm hasn't been changed yet).

Note that previously this operation (i.e. setting an invalid algorithm)
would result in no algorithm being selected, which means that this
represents a small change in the default behaviour.

Minchan said:

For initializing zram, we need to set up 3 optional parameters in advance.

1. the number of compression streams
2. memory limitation
3. compression algorithm

Although user pass completely wrong value to set up for 1 and 2
parameters, it's okay because they have default value so zram will be
initialized with the default value (of course, when user passes a wrong
value via *echo*, sysfs returns -EINVAL so the user can notice it).

But 3 is not consistent with other optional parameters.  IOW, if the
user passes a wrong value to set up 3 parameter, zram's initialization
would fail unlike other optional parameters.

So this patch makes them consistent.
Signed-off-by: NLuis Henriques <luis.henriques@canonical.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d5b43bf

mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM · 71baba4b

由 Mel Gorman 提交于 11月 06, 2015

__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep.  Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep.  The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake.  As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags.  This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71baba4b

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc

由 Mel Gorman 提交于 11月 06, 2015

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0164adc

03 11月, 2015 5 次提交

rbd: remove duplicate calls to rbd_dev_mapping_clear() · 4afb04c0

由 Ilya Dryomov 提交于 10月 22, 2015

Commit d1cf5788 ("rbd: set mapping info earlier") defined
rbd_dev_mapping_clear(), but, just a few days after, commit
f35a4dee ("rbd: set the mapping size and features later") moved
rbd_dev_mapping_set() calls and added another rbd_dev_mapping_clear()
call instead of moving the old one. Around the same time, another
duplicate was introduced in rbd_dev_device_release() - kill both.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4afb04c0

rbd: set device_type::release instead of device::release · 6cac4695

由 Ilya Dryomov 提交于 10月 16, 2015

No point in providing an empty device_type::release callback and then
setting device::release for each rbd_dev dynamically.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6cac4695

rbd: don't free rbd_dev outside of the release callback · dd5ac32d

由 Ilya Dryomov 提交于 10月 16, 2015

struct rbd_device has struct device embedded in it, which means it's
part of kobject universe and has an unpredictable life cycle.  Freeing
its memory outside of the release callback is flawed, yet commits
200a6a8b ("rbd: don't destroy rbd_dev in device release function")
and 8ad42cd0 ("rbd: don't have device release destroy rbd_dev")
moved rbd_dev_destroy() out to rbd_dev_image_release().

This commit reverts most of that, the key points are:

- rbd_dev->dev is initialized in rbd_dev_create(), making it possible
  to use rbd_dev_destroy() - which is just a put_device() - both before
  we register with device core and after.

- rbd_dev_release() (the release callback) is the only place we
  kfree(rbd_dev).  It's also where we do module_put(), keeping the
  module unload race window as small as possible.

- We pin the module in rbd_dev_create(), but only for mapping
  rbd_dev-s.  Moving image related stuff out of struct rbd_device into
  another struct which isn't tied with sysfs and device core is long
  overdue, but until that happens, this will keep rbd module refcount
  (which users can observe with lsmod) sane.

Fixes: http://tracker.ceph.com/issues/12697

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dd5ac32d

rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails · b51c83c2

由 Ilya Dryomov 提交于 10月 15, 2015

Returning pool id (i.e. >= 0) from a sysfs ->store() callback makes
userspace think it needs to retry the write.  Fix it - it's a leftover
from the times when the equivalent of rbd_dev_create() was the first
action in rbd_add().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b51c83c2

rbd: drop null test before destroy functions · 13bf2834

由 Julia Lawall 提交于 9月 13, 2015

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL) {
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
  x = NULL;
-}
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13bf2834

31 10月, 2015 1 次提交

rbd: require stable pages if message data CRCs are enabled · bae818ee

由 Ronny Hegewald 提交于 10月 15, 2015

rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.

But since kernel 3.9 (patch 1d1d1a76
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.

This patch sets the necessary flag to get stable pages back for rbd.

In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).

Cc: stable@vger.kernel.org # 3.9+, needs backporting
Signed-off-by: NRonny Hegewald <Ronny.Hegewald@online.de>
[idryomov@gmail.com: require stable pages only in crc case, changelog]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bae818ee

24 10月, 2015 2 次提交

rbd: prevent kernel stack blow up on rbd map · 6d69bb53

由 Ilya Dryomov 提交于 10月 11, 2015

Mapping an image with a long parent chain (e.g. image foo, whose parent
is bar, whose parent is baz, etc) currently leads to a kernel stack
overflow, due to the following recursion in the reply path:

  rbd_osd_req_callback()
    rbd_obj_request_complete()
      rbd_img_obj_callback()
        rbd_img_parent_read_callback()
          rbd_obj_request_complete()
            ...

Limit the parent chain to 16 images, which is ~5K worth of stack.  When
the above recursion is eliminated, this limit can be lifted.

Fixes: http://tracker.ceph.com/issues/12538

Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

6d69bb53

rbd: don't leak parent_spec in rbd_dev_probe_parent() · 1f2c6651

由 Ilya Dryomov 提交于 10月 11, 2015

Currently we leak parent_spec and trigger a "parent reference
underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
The problem is we take the !parent out_err branch and that only drops
refcounts; parent_spec that would've been freed had we called
rbd_dev_unparent() remains and triggers rbd_warn() in
rbd_dev_parent_put() - at that point we have parent_spec != NULL and
parent_ref == 0, so counter ends up being -1 after the decrement.

Redo rbd_dev_probe_parent() to fix this.

Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

1f2c6651

23 10月, 2015 6 次提交

xen/xenbus: Rename *RING_PAGE* to *RING_GRANT* · 9cce2914

由 Julien Grall 提交于 10月 13, 2015

Linux may use a different page size than the size of grant. So make
clear that the order is actually in number of grant.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

9cce2914

block/xen-blkback: Make it running on 64KB page granularity · 67de5dfb

由 Julien Grall 提交于 5月 05, 2015

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: N"Roger Pau Monné" <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

67de5dfb

block/xen-blkfront: Make it running on 64KB page granularity · c004a6fe

由 Julien Grall 提交于 7月 22, 2015

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

c004a6fe

block/xen-blkfront: split get_grant in 2 · 4f503fbd

由 Julien Grall 提交于 7月 01, 2015

Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

4f503fbd

block/xen-blkfront: Store a page rather a pfn in the grant structure · a7a6df22

由 Julien Grall 提交于 6月 30, 2015

All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a7a6df22

block/xen-blkfront: Split blkif_queue_request in 2 · 33204663

由 Julien Grall 提交于 6月 29, 2015

Currently, blkif_queue_request has 2 distinct execution path:
    - Send a discard request
    - Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

33204663

16 10月, 2015 3 次提交

rbd: use writefull op for object size writes · e30b7577

由 Ilya Dryomov 提交于 10月 07, 2015

This covers only the simplest case - an object size sized write, but
it's still useful in tiering setups when EC is used for the base tier
as writefull op can be proxied, saving an object promotion.

Even though updating ceph_osdc_new_request() to allow writefull should
just be a matter of fixing an assert, I didn't do it because its only
user is cephfs.  All other sites were updated.

Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e30b7577

rbd: set max_sectors explicitly · 0d9fde4f

由 Ilya Dryomov 提交于 10月 07, 2015

Commit 30e2bc08 ("Revert "block: remove artifical max_hw_sectors
cap"") restored a clamp on max_sectors.  It's now 2560 sectors instead
of 1024, but it's not good enough: we set max_hw_sectors to rbd object
size because we don't want object sized I/Os to be split, and the
default object size is 4M.

So, set max_sectors to max_hw_sectors in rbd at queue init time.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

0d9fde4f

NVMe: Fix memory leak on retried commands · 0dfc70c3

由 Keith Busch 提交于 10月 15, 2015

Resources are reallocated for requeued commands, so unmap and release
the iod for the failed command.

It's a pretty bad memory leak and causes a kernel hang if you remove a
drive because of a busy dma pool. You'll get messages spewing like this:

  nvme 0000:xx:xx.x: dma_pool_destroy prp list 256, ffff880420dec000 busy

and lock up pci and the driver since removal never completes while
holding a lock.

Cc: stable@vger.kernel.org
Cc: <stable@vger.kernel.org> # 4.0.x-
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0dfc70c3

15 10月, 2015 3 次提交

nvme: use an integer value to Linux errno values · 81c04b94

由 Christoph Hellwig 提交于 10月 12, 2015

Use a separate integer variable to hold the signed Linux errno
values we pass back to the block layer.  Note that for pass through
commands those might still be NVMe values, but those fit into the
int as well.

Fixes: f4829a9b: ("blk-mq: fix racy updates of rq->errors")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

81c04b94

drbd: stop including <asm-generic/kmap_types.h> · dbcbdc43

由 Christoph Hellwig 提交于 8月 28, 2015

<linux/highmem.h> is the placace the get the kmap type flags, asm-generic
files are generic implementations only to be used by architecture code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

dbcbdc43

move io-64-nonatomic*.h out of asm-generic · 2f8e2c87

由 Christoph Hellwig 提交于 8月 28, 2015

These are not implementations of default architecture code but helpers
for drivers. Move them to the place they belong to.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDarren Hart <dvhart@linux.intel.com>
Acked-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

2f8e2c87

13 10月, 2015 1 次提交

nvme: fix 32-bit build warning · 835da3f9

由 Arnd Bergmann 提交于 10月 06, 2015

Compiling the nvme driver on 32-bit warns about a cast from a __u64
variable to a pointer:

drivers/block/nvme-core.c: In function 'nvme_submit_io':
drivers/block/nvme-core.c:1847:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    (void __user *)io.addr, length, NULL, 0);

The cast here is intentional and safe, so we can shut up the
gcc warning by adding an intermediate cast to 'uintptr_t'.

I had previously submitted a patch to fix this problem in the
nvme driver, but it was accepted on the same day that two new
warnings got added.

For clarification, I also change the third instance of this cast
to use uintptr_t instead of unsigned long now.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

835da3f9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功