提交 · 46f92d42ee37e10970e33891b7b61a342bd97aeb · openeuler / raspberrypi-kernel

23 9月, 2014 10 次提交

blk-mq: unshared timeout handler · 46f92d42

由 Christoph Hellwig 提交于 9月 13, 2014

Duplicate the (small) timeout handler in blk-mq so that we can pass
arguments more easily to the driver timeout handler.  This enables
the next patch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

46f92d42

blk-mq: fix and simplify tag iteration for the timeout handler · 81481eb4

由 Christoph Hellwig 提交于 9月 13, 2014

Don't do a kmalloc from timer to handle timeouts, chances are we could be
under heavy load or similar and thus just miss out on the timeouts.
Fortunately it is very easy to just iterate over all in use tags, and doing
this properly actually cleans up the blk_mq_busy_iter API as well, and
prepares us for the next patch by passing a reserved argument to the
iterator.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

81481eb4

blk-mq: rename blk_mq_end_io to blk_mq_end_request · c8a446ad

由 Christoph Hellwig 提交于 9月 13, 2014

Now that we've changed the driver API on the submission side use the
opportunity to fix up the name on the completion side to fit into the
general scheme.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

c8a446ad

blk-mq: call blk_mq_start_request from ->queue_rq · e2490073

由 Christoph Hellwig 提交于 9月 13, 2014

When we call blk_mq_start_request from the core blk-mq code before calling into
->queue_rq there is a racy window where the timeout handler can hit before we've
fully set up the driver specific part of the command.

Move the call to blk_mq_start_request into the driver so the driver can start
the request only once it is fully set up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e2490073

blk-mq: remove REQ_END · bf572297

由 Christoph Hellwig 提交于 9月 13, 2014

Pass an explicit parameter for the last request in a batch to ->queue_rq
instead of using a request flag.  Besides being a cleaner and non-stateful
interface this is also required for the next patch, which fixes the blk-mq
I/O submission code to not start a time too early.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

bf572297

blk-mq: use blk_mq_start_hw_queues() when running requeue work · 8b957415

由 Jens Axboe 提交于 9月 19, 2014

When requests are retried due to hw or sw resource shortages,
we often stop the associated hardware queue. So ensure that we
restart the queues when running the requeue work, otherwise the
queue run will be a no-op.
Signed-off-by: NJens Axboe <axboe@fb.com>

8b957415

blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps() · 6b55e1f2

由 Jens Axboe 提交于 9月 19, 2014

__blk_mq_alloc_rq_maps() can be invoked multiple times, if we scale
back the queue depth if we are low on memory. So don't clear
set->tags when we fail, this is handled directly in
the parent function, blk_mq_alloc_tag_set().
Reported-by: NRobert Elliott  <Elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6b55e1f2

blk-mq: avoid infinite recursion with the FUA flag · a57a178a

由 Christoph Hellwig 提交于 9月 16, 2014

We should not insert requests into the flush state machine from
blk_mq_insert_request.  All incoming flush requests come through
blk_{m,s}q_make_request and are handled there, while blk_execute_rq_nowait
should only be called for BLOCK_PC requests.  All other callers
deal with requests that already went through the flush statemchine
and shouldn't be reinserted into it.
Reported-by: NRobert Elliott  <Elliott@hp.com>
Debugged-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a57a178a

blk-mq: Avoid race condition with uninitialized requests · 683d0e12

由 David Hildenbrand 提交于 9月 18, 2014

This patch should fix the bug reported in
https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized
pointers when racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point
where a request is freed. So we will never end up with pending flush
request indicators that might trigger dereferences of invalid pointers
in blk_mq_timeout_check().

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reported-by: NPaulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Tested-by: NPaulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

683d0e12

blk-mq: request deadline must be visible before marking rq as started · 538b7534

由 Jens Axboe 提交于 9月 16, 2014

When we start the request, we set the deadline and flip the bits
marking the request as started and non-complete. However, it's
important that the deadline store is ordered before flipping the
bits, otherwise we could have a small window where the request is
marked started but with an invalid deadline. This can confuse the
timeout handling.
Suggested-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

538b7534

10 9月, 2014 2 次提交

blk-mq: scale depth and rq map appropriate if low on memory · a5164405

由 Jens Axboe 提交于 9月 10, 2014

If we are running in a kdump environment, resources are scarce.
For some SCSI setups with a huge set of shared tags, we run out
of memory allocating what the drivers is asking for. So implement
a scale back logic to reduce the tag depth for those cases, allowing
the driver to successfully load.

We should extend this to detect low memory situations, and implement
a sane fallback for those (1 queue, 64 tags, or something like that).
Tested-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a5164405

Block: fix unbalanced bypass-disable in blk_register_queue · df35c7c9

由 Alan Stern 提交于 9月 09, 2014

When a queue is registered, the block layer turns off the bypass
setting (because bypass is enabled when the queue is created).  This
doesn't work well for queues that are unregistered and then registered
again; we get a WARNING because of the unbalanced calls to
blk_queue_bypass_end().

This patch fixes the problem by making blk_register_queue() call
blk_queue_bypass_end() only the first time the queue is registered.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NTejun Heo <tj@kernel.org>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@fb.com>

df35c7c9

09 9月, 2014 1 次提交

block, bdi: an active gendisk always has a request_queue associated with it · ff9ea323

由 Tejun Heo 提交于 9月 08, 2014

bdev_get_queue() returns the request_queue associated with the
specified block_device.  blk_get_backing_dev_info() makes use of
bdev_get_queue() to determine the associated bdi given a block_device.

All the callers of bdev_get_queue() including
blk_get_backing_dev_info() assume that bdev_get_queue() may return
NULL and implement NULL handling; however, bdev_get_queue() requires
the passed in block_device is opened and attached to its gendisk.
Because an active gendisk always has a valid request_queue associated
with it, bdev_get_queue() can never return NULL and neither can
blk_get_backing_dev_info().

Make it clear that neither of the two functions can return NULL and
remove NULL handling from all the callers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ff9ea323

08 9月, 2014 1 次提交

blkcg: remove blkcg->id · f4da8072

由 Tejun Heo 提交于 9月 08, 2014

blkcg->id is a unique id given to each blkcg; however, the
cgroup_subsys_state which each blkcg embeds already has ->serial_nr
which can be used for the same purpose.  Drop blkcg->id and replace
its uses with blkcg->css.serial_nr.  Rename cfq_cgroup->blkcg_id to
->blkcg_serial_nr and @id in check_blkcg_changed() to @serial_nr for
consistency.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4da8072

04 9月, 2014 2 次提交

block: Fix dev_t minor allocation lifetime · 2da78092

由 Keith Busch 提交于 8月 26, 2014

Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

2da78092

blk-mq: cleanup after blk_mq_init_rq_map failures · 5676e7b6

由 Robert Elliott 提交于 9月 02, 2014

In blk-mq.c blk_mq_alloc_tag_set, if:
	set->tags = kmalloc_node()
succeeds, but one of the blk_mq_init_rq_map() calls fails,
	goto out_unwind;
needs to free set->tags so the caller is not obligated
to do so.  None of the current callers (null_blk,
virtio_blk, virtio_blk, or the forthcoming scsi-mq)
do so.

set->tags needs to be set to NULL after doing so,
so other tag cleanup logic doesn't try to free
a stale pointer later.  Also set it to NULL
in blk_mq_free_tag_set.

Tested with error injection on the forthcoming
scsi-mq + hpsa combination.
Signed-off-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5676e7b6

03 9月, 2014 1 次提交

blk-merge: fix blk_recount_segments · 07388549

由 Ming Lei 提交于 9月 02, 2014

QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio->bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:

	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio->bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53(block: add queue flag for disabling SG merging).
Reported-by: NKick In <pierre-andre.morey@canonical.com>
Tested-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07388549

29 8月, 2014 2 次提交

bsg: fix potential error pointer dereference · 55872c5a

由 Jens Axboe 提交于 8月 28, 2014

Dan writes:

block/bsg.c:327 bsg_map_hdr() error: 'next_rq' dereferencing possible
ERR_PTR().

Fix this by setting next_rq to NULL, for the case where it can be
!= NULL but an error pointer.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

55872c5a

block,scsi: fixup blk_get_request dead queue scenarios · a492f075

由 Joe Lawrence 提交于 8月 28, 2014

The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.

For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request.  It may fail if the queue is dead, or the caller was
unwilling to wait.
Signed-off-by: NJoe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a492f075

28 8月, 2014 1 次提交

cfq-iosched: Add comments on update timing of weight · 7b5af5cf

由 Toshiaki Makita 提交于 8月 28, 2014

Explain that weight has to be updated on activation.
This complements previous fix e15693ef ("cfq-iosched: Fix wrong
children_weight calculation").
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b5af5cf

27 8月, 2014 2 次提交

block,scsi: verify return pointer from blk_get_request · eb571eea

由 Joe Lawrence 提交于 7月 02, 2014

The blk-core dead queue checks introduce an error scenario to
blk_get_request that returns NULL if the request queue has been
shutdown. This affects the behavior for __GFP_WAIT callers, who should
verify the return value before dereferencing.
Signed-off-by: NJoe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

eb571eea

cfq-iosched: Fix wrong children_weight calculation · e15693ef

由 Toshiaki Makita 提交于 8月 26, 2014

cfq_group_service_tree_add() is applying new_weight at the beginning of
the function via cfq_update_group_weight().
This actually allows weight to change between adding it to and subtracting
it from children_weight, and triggers WARN_ON_ONCE() in
cfq_group_service_tree_del(), or even causes oops by divide error during
vfr calculation in cfq_group_service_tree_add().

The detailed scenario is as follows:
1. Create blkio cgroups X and Y as a child of X.
   Set X's weight to 500 and perform some I/O to apply new_weight.
   This X's I/O completes before starting Y's I/O.
2. Y starts I/O and cfq_group_service_tree_add() is called with Y.
3. cfq_group_service_tree_add() walks up the tree during children_weight
   calculation and adds parent X's weight (500) to children_weight of root.
   children_weight becomes 500.
4. Set X's weight to 1000.
5. X starts I/O and cfq_group_service_tree_add() is called with X.
6. cfq_group_service_tree_add() applies its new_weight (1000).
7. I/O of Y completes and cfq_group_service_tree_del() is called with Y.
8. I/O of X completes and cfq_group_service_tree_del() is called with X.
9. cfq_group_service_tree_del() subtracts X's weight (1000) from
   children_weight of root. children_weight becomes -500.
   This triggers WARN_ON_ONCE().
10. Set X's weight to 500.
11. X starts I/O and cfq_group_service_tree_add() is called with X.
12. cfq_group_service_tree_add() applies its new_weight (500) and adds it
    to children_weight of root. children_weight becomes 0. Calcularion of
    vfr triggers oops by divide error.

weight should be updated right before adding it to children_weight.
Reported-by: NRuki Sekiya <sekiya.ruki@lab.ntt.co.jp>
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

e15693ef

26 8月, 2014 1 次提交

block: fix error handling in sg_io · d19d7446

由 Sabrina Dubroca 提交于 8月 26, 2014

Before commit 2cada584 ("block: cleanup error handling in sg_io"),
we had ret = 0 before entering the last big if block of sg_io.

Since 2cada584, ret = -EFAULT, which breaks hdparm:

/dev/sda:
 setting Advanced Power Management level to 0xc8 (200)
 HDIO_DRIVE_CMD failed: Bad address
 APM_level      = 128
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Fixes: 2cada584 ("block: cleanup error handling in sg_io")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d19d7446

23 8月, 2014 2 次提交

fix regression in SCSI_IOCTL_SEND_COMMAND · 2ba136da

由 Tony Battersby 提交于 8月 22, 2014

blk_rq_set_block_pc() memsets rq->cmd to 0, so it should come
immediately after blk_get_request() to avoid overwriting the
user-supplied CDB.  Also check for failure to allocate rq.

Fixes: f27b087b ("block: add blk_rq_set_block_pc()")
Cc: <stable@vger.kernel.org> # 3.16.x
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ba136da

scsi-mq: fix requests that use a separate CDB buffer · 6f4a1626

由 Tony Battersby 提交于 8月 22, 2014

This patch fixes code such as the following with scsi-mq enabled:

    rq = blk_get_request(...);
    blk_rq_set_block_pc(rq);

    rq->cmd = my_cmd_buffer; /* separate CDB buffer */

    blk_execute_rq_nowait(...);

Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only).  Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6f4a1626

22 8月, 2014 6 次提交

block: support > 16 byte CDBs for SG_IO · a57821ca

由 Christoph Hellwig 提交于 8月 21, 2014

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBoaz Harrosh <boaz@plexistor.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a57821ca

block: cleanup error handling in sg_io · 2cada584

由 Christoph Hellwig 提交于 8月 21, 2014

Make sure we always clean up through the out label and just have
a single place to put the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2cada584

blk-mq: blk_mq_freeze_queue() should allow nesting · cddd5d17

由 Tejun Heo 提交于 8月 16, 2014

While converting to percpu_ref for freezing, add703fd ("blk-mq:
use percpu_ref for mq usage count") incorrectly made
blk_mq_freeze_queue() misbehave when freezing is nested due to
percpu_ref_kill() being invoked on an already killed ref.

Fix it by making blk_mq_freeze_queue() kill and kick the queue only
for the outermost freeze attempt.  All the nested ones can simply wait
for the ref to reach zero.

While at it, remove unnecessary @wake initialization from
blk_mq_unfreeze_queue().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

cddd5d17

J
blk-mq: correct a few wrong/bad comments · a68aafa5
由 Jens Axboe 提交于 8月 15, 2014
```
Just grammar or spelling errors, nothing major.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
a68aafa5

block: Fix BUG_ON when pi errors occur · 16f408dc

由 Sagi Grimberg 提交于 8月 13, 2014

When getting a pi error we get to bio_integrity_end_io with
bi_remaining already decremented to 0 where we will eventually
need to call bio_endio with restored original bio completion handler.
Calling bio_endio invokes a BUG_ON(). We should call bio_endio_nodec
instead, like what is done in bio_integrity_verify_fn.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

16f408dc

blk-mq: don't allow merges if turned off for the queue · 274a5843

由 Jens Axboe 提交于 8月 15, 2014

blk-mq uses BLK_MQ_F_SHOULD_MERGE, as set by the driver at init time,
to determine whether it should merge IO or not. However, this could
also be disabled by the admin, if merging is switched off through
sysfs. So check the general queue state as well before attempting
to merge IO.
Reported-by: NRob Elliott <Elliott@hp.com>
Tested-by: NRob Elliott <Elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

274a5843

16 8月, 2014 1 次提交

blk-mq: fix WARNING "percpu_ref_kill() called more than once!" · dd840087

由 Ming Lei 提交于 8月 15, 2014

Before doing queue release, the queue has been freezed already
by blk_cleanup_queue(), so needn't to freeze queue for deleting
tag set.

This patch fixes the WARNING of "percpu_ref_kill() called more than once!"
which is triggered during unloading block driver.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dd840087

06 8月, 2014 1 次提交

partitions: aix.c: off by one bug · d97a86c1

由 Dan Carpenter 提交于 8月 05, 2014

The lvip[] array has "state->limit" elements so the condition here
should be >= instead of >.

Fixes: 6ceea22b ('partitions: add aix lvm partition support files')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NPhilippe De Muyter <phdm@macqel.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

d97a86c1

02 8月, 2014 1 次提交

block: use kmalloc alignment for bio slab · 6a241483

由 Mikulas Patocka 提交于 3月 28, 2014

Various subsystems can ask the bio subsystem to create a bio slab cache
with some free space before the bio.  This free space can be used for any
purpose.  Device mapper uses this per-bio-data feature to place some
target-specific and device-mapper specific data before the bio, so that
the target-specific data doesn't have to be allocated separately.

This per-bio-data mechanism is used in place of kmalloc, so we need the
allocated slab to have the same memory alignment as memory allocated
with kmalloc.

Change bio_find_or_create_slab() so that it uses ARCH_KMALLOC_MINALIGN
alignment when creating the slab cache.  This is needed so that dm-crypt
can use per-bio-data for encryption - the crypto subsystem assumes this
data will have the same alignment as kmalloc'ed memory.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJens Axboe <axboe@fb.com>

6a241483

16 7月, 2014 1 次提交

blkcg: don't call into policy draining if root_blkg is already gone · 2a1b4cf2

由 Tejun Heo 提交于 7月 05, 2014

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bc ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NShirish Pargaonkar <spargaonkar@suse.com>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NJet Chen <jet.chen@intel.com>
Cc: stable@vger.kernel.org
Tested-by: NShirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a1b4cf2

15 7月, 2014 3 次提交

cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5

由 Tejun Heo 提交于 7月 15, 2014

Currently, cftypes added by cgroup_add_cftypes() are used for both the
unified default hierarchy and legacy ones and subsystems can mark each
file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
appear only on one of them.  This is quite hairy and error-prone.
Also, we may end up exposing interface files to the default hierarchy
without thinking it through.

cgroup_subsys will grow two separate cftype addition functions and
apply each only on the hierarchies of the matching type.  This will
allow organizing cftypes in a lot clearer way and encourage subsystems
to scrutinize the interface which is being exposed in the new default
hierarchy.

In preparation, this patch adds cgroup_add_legacy_cftypes() which
currently is a simple wrapper around cgroup_add_cftypes() and replaces
all cgroup_add_cftypes() usages with it.

While at it, this patch drops a completely spurious return from
__hugetlb_cgroup_file_init().

This patch doesn't introduce any functional differences.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

2cf669a5

cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e

由 Tejun Heo 提交于 7月 15, 2014

Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them.  This is quite hairy and error-prone.  Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.

cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type.  This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.

In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes.  This patch is pure rename.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

5577964e

J
Revert "bio: modify __bio_add_page() to accept pages that don't start a new segment" · 26a33794
由 Jens Axboe 提交于 7月 14, 2014
```
This reverts commit 254c4407.

It causes crashes with cryptsetup, even after a few iterations and
updates. Drop it for now.
```
26a33794

14 7月, 2014 1 次提交

block: provide compat ioctl for BLKZEROOUT · 3b3a1814

由 Mikulas Patocka 提交于 7月 02, 2014

This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer
to two uint64_t values, so there is no need to translate it.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# 3.7+
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3b3a1814

12 7月, 2014 1 次提交

blkcg: don't call into policy draining if root_blkg is already gone · 0b462c89

由 Tejun Heo 提交于 7月 05, 2014

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bc ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NShirish Pargaonkar <spargaonkar@suse.com>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NJet Chen <jet.chen@intel.com>
Cc: stable@vger.kernel.org
Tested-by: NShirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0b462c89