提交 · 538b75341835e3c2041ff066408de10d24fdc830 · openeuler / raspberrypi-kernel

23 9月, 2014 1 次提交

blk-mq: request deadline must be visible before marking rq as started · 538b7534

由 Jens Axboe 提交于 9月 16, 2014

When we start the request, we set the deadline and flip the bits
marking the request as started and non-complete. However, it's
important that the deadline store is ordered before flipping the
bits, otherwise we could have a small window where the request is
marked started but with an invalid deadline. This can confuse the
timeout handling.
Suggested-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

538b7534

10 9月, 2014 2 次提交

blk-mq: scale depth and rq map appropriate if low on memory · a5164405

由 Jens Axboe 提交于 9月 10, 2014

If we are running in a kdump environment, resources are scarce.
For some SCSI setups with a huge set of shared tags, we run out
of memory allocating what the drivers is asking for. So implement
a scale back logic to reduce the tag depth for those cases, allowing
the driver to successfully load.

We should extend this to detect low memory situations, and implement
a sane fallback for those (1 queue, 64 tags, or something like that).
Tested-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a5164405

Block: fix unbalanced bypass-disable in blk_register_queue · df35c7c9

由 Alan Stern 提交于 9月 09, 2014

When a queue is registered, the block layer turns off the bypass
setting (because bypass is enabled when the queue is created).  This
doesn't work well for queues that are unregistered and then registered
again; we get a WARNING because of the unbalanced calls to
blk_queue_bypass_end().

This patch fixes the problem by making blk_register_queue() call
blk_queue_bypass_end() only the first time the queue is registered.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NTejun Heo <tj@kernel.org>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@fb.com>

df35c7c9

04 9月, 2014 2 次提交

block: Fix dev_t minor allocation lifetime · 2da78092

由 Keith Busch 提交于 8月 26, 2014

Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

2da78092

blk-mq: cleanup after blk_mq_init_rq_map failures · 5676e7b6

由 Robert Elliott 提交于 9月 02, 2014

In blk-mq.c blk_mq_alloc_tag_set, if:
	set->tags = kmalloc_node()
succeeds, but one of the blk_mq_init_rq_map() calls fails,
	goto out_unwind;
needs to free set->tags so the caller is not obligated
to do so.  None of the current callers (null_blk,
virtio_blk, virtio_blk, or the forthcoming scsi-mq)
do so.

set->tags needs to be set to NULL after doing so,
so other tag cleanup logic doesn't try to free
a stale pointer later.  Also set it to NULL
in blk_mq_free_tag_set.

Tested with error injection on the forthcoming
scsi-mq + hpsa combination.
Signed-off-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5676e7b6

03 9月, 2014 1 次提交

blk-merge: fix blk_recount_segments · 07388549

由 Ming Lei 提交于 9月 02, 2014

QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio->bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:

	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio->bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53(block: add queue flag for disabling SG merging).
Reported-by: NKick In <pierre-andre.morey@canonical.com>
Tested-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07388549

28 8月, 2014 1 次提交

cfq-iosched: Add comments on update timing of weight · 7b5af5cf

由 Toshiaki Makita 提交于 8月 28, 2014

Explain that weight has to be updated on activation.
This complements previous fix e15693ef ("cfq-iosched: Fix wrong
children_weight calculation").
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b5af5cf

27 8月, 2014 1 次提交

cfq-iosched: Fix wrong children_weight calculation · e15693ef

由 Toshiaki Makita 提交于 8月 26, 2014

cfq_group_service_tree_add() is applying new_weight at the beginning of
the function via cfq_update_group_weight().
This actually allows weight to change between adding it to and subtracting
it from children_weight, and triggers WARN_ON_ONCE() in
cfq_group_service_tree_del(), or even causes oops by divide error during
vfr calculation in cfq_group_service_tree_add().

The detailed scenario is as follows:
1. Create blkio cgroups X and Y as a child of X.
   Set X's weight to 500 and perform some I/O to apply new_weight.
   This X's I/O completes before starting Y's I/O.
2. Y starts I/O and cfq_group_service_tree_add() is called with Y.
3. cfq_group_service_tree_add() walks up the tree during children_weight
   calculation and adds parent X's weight (500) to children_weight of root.
   children_weight becomes 500.
4. Set X's weight to 1000.
5. X starts I/O and cfq_group_service_tree_add() is called with X.
6. cfq_group_service_tree_add() applies its new_weight (1000).
7. I/O of Y completes and cfq_group_service_tree_del() is called with Y.
8. I/O of X completes and cfq_group_service_tree_del() is called with X.
9. cfq_group_service_tree_del() subtracts X's weight (1000) from
   children_weight of root. children_weight becomes -500.
   This triggers WARN_ON_ONCE().
10. Set X's weight to 500.
11. X starts I/O and cfq_group_service_tree_add() is called with X.
12. cfq_group_service_tree_add() applies its new_weight (500) and adds it
    to children_weight of root. children_weight becomes 0. Calcularion of
    vfr triggers oops by divide error.

weight should be updated right before adding it to children_weight.
Reported-by: NRuki Sekiya <sekiya.ruki@lab.ntt.co.jp>
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

e15693ef

26 8月, 2014 1 次提交

block: fix error handling in sg_io · d19d7446

由 Sabrina Dubroca 提交于 8月 26, 2014

Before commit 2cada584 ("block: cleanup error handling in sg_io"),
we had ret = 0 before entering the last big if block of sg_io.

Since 2cada584, ret = -EFAULT, which breaks hdparm:

/dev/sda:
 setting Advanced Power Management level to 0xc8 (200)
 HDIO_DRIVE_CMD failed: Bad address
 APM_level      = 128
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Fixes: 2cada584 ("block: cleanup error handling in sg_io")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d19d7446

23 8月, 2014 2 次提交

fix regression in SCSI_IOCTL_SEND_COMMAND · 2ba136da

由 Tony Battersby 提交于 8月 22, 2014

blk_rq_set_block_pc() memsets rq->cmd to 0, so it should come
immediately after blk_get_request() to avoid overwriting the
user-supplied CDB.  Also check for failure to allocate rq.

Fixes: f27b087b ("block: add blk_rq_set_block_pc()")
Cc: <stable@vger.kernel.org> # 3.16.x
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ba136da

scsi-mq: fix requests that use a separate CDB buffer · 6f4a1626

由 Tony Battersby 提交于 8月 22, 2014

This patch fixes code such as the following with scsi-mq enabled:

    rq = blk_get_request(...);
    blk_rq_set_block_pc(rq);

    rq->cmd = my_cmd_buffer; /* separate CDB buffer */

    blk_execute_rq_nowait(...);

Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only).  Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6f4a1626

22 8月, 2014 6 次提交

block: support > 16 byte CDBs for SG_IO · a57821ca

由 Christoph Hellwig 提交于 8月 21, 2014

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBoaz Harrosh <boaz@plexistor.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a57821ca

block: cleanup error handling in sg_io · 2cada584

由 Christoph Hellwig 提交于 8月 21, 2014

Make sure we always clean up through the out label and just have
a single place to put the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2cada584

blk-mq: blk_mq_freeze_queue() should allow nesting · cddd5d17

由 Tejun Heo 提交于 8月 16, 2014

While converting to percpu_ref for freezing, add703fd ("blk-mq:
use percpu_ref for mq usage count") incorrectly made
blk_mq_freeze_queue() misbehave when freezing is nested due to
percpu_ref_kill() being invoked on an already killed ref.

Fix it by making blk_mq_freeze_queue() kill and kick the queue only
for the outermost freeze attempt.  All the nested ones can simply wait
for the ref to reach zero.

While at it, remove unnecessary @wake initialization from
blk_mq_unfreeze_queue().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

cddd5d17

J
blk-mq: correct a few wrong/bad comments · a68aafa5
由 Jens Axboe 提交于 8月 15, 2014
```
Just grammar or spelling errors, nothing major.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
a68aafa5

block: Fix BUG_ON when pi errors occur · 16f408dc

由 Sagi Grimberg 提交于 8月 13, 2014

When getting a pi error we get to bio_integrity_end_io with
bi_remaining already decremented to 0 where we will eventually
need to call bio_endio with restored original bio completion handler.
Calling bio_endio invokes a BUG_ON(). We should call bio_endio_nodec
instead, like what is done in bio_integrity_verify_fn.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

16f408dc

blk-mq: don't allow merges if turned off for the queue · 274a5843

由 Jens Axboe 提交于 8月 15, 2014

blk-mq uses BLK_MQ_F_SHOULD_MERGE, as set by the driver at init time,
to determine whether it should merge IO or not. However, this could
also be disabled by the admin, if merging is switched off through
sysfs. So check the general queue state as well before attempting
to merge IO.
Reported-by: NRob Elliott <Elliott@hp.com>
Tested-by: NRob Elliott <Elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

274a5843

16 8月, 2014 1 次提交

blk-mq: fix WARNING "percpu_ref_kill() called more than once!" · dd840087

由 Ming Lei 提交于 8月 15, 2014

Before doing queue release, the queue has been freezed already
by blk_cleanup_queue(), so needn't to freeze queue for deleting
tag set.

This patch fixes the WARNING of "percpu_ref_kill() called more than once!"
which is triggered during unloading block driver.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dd840087

06 8月, 2014 1 次提交

partitions: aix.c: off by one bug · d97a86c1

由 Dan Carpenter 提交于 8月 05, 2014

The lvip[] array has "state->limit" elements so the condition here
should be >= instead of >.

Fixes: 6ceea22b ('partitions: add aix lvm partition support files')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NPhilippe De Muyter <phdm@macqel.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

d97a86c1

02 8月, 2014 1 次提交

block: use kmalloc alignment for bio slab · 6a241483

由 Mikulas Patocka 提交于 3月 28, 2014

Various subsystems can ask the bio subsystem to create a bio slab cache
with some free space before the bio.  This free space can be used for any
purpose.  Device mapper uses this per-bio-data feature to place some
target-specific and device-mapper specific data before the bio, so that
the target-specific data doesn't have to be allocated separately.

This per-bio-data mechanism is used in place of kmalloc, so we need the
allocated slab to have the same memory alignment as memory allocated
with kmalloc.

Change bio_find_or_create_slab() so that it uses ARCH_KMALLOC_MINALIGN
alignment when creating the slab cache.  This is needed so that dm-crypt
can use per-bio-data for encryption - the crypto subsystem assumes this
data will have the same alignment as kmalloc'ed memory.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJens Axboe <axboe@fb.com>

6a241483

16 7月, 2014 1 次提交

blkcg: don't call into policy draining if root_blkg is already gone · 2a1b4cf2

由 Tejun Heo 提交于 7月 05, 2014

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bc ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NShirish Pargaonkar <spargaonkar@suse.com>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NJet Chen <jet.chen@intel.com>
Cc: stable@vger.kernel.org
Tested-by: NShirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a1b4cf2

15 7月, 2014 3 次提交

cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5

由 Tejun Heo 提交于 7月 15, 2014

Currently, cftypes added by cgroup_add_cftypes() are used for both the
unified default hierarchy and legacy ones and subsystems can mark each
file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
appear only on one of them.  This is quite hairy and error-prone.
Also, we may end up exposing interface files to the default hierarchy
without thinking it through.

cgroup_subsys will grow two separate cftype addition functions and
apply each only on the hierarchies of the matching type.  This will
allow organizing cftypes in a lot clearer way and encourage subsystems
to scrutinize the interface which is being exposed in the new default
hierarchy.

In preparation, this patch adds cgroup_add_legacy_cftypes() which
currently is a simple wrapper around cgroup_add_cftypes() and replaces
all cgroup_add_cftypes() usages with it.

While at it, this patch drops a completely spurious return from
__hugetlb_cgroup_file_init().

This patch doesn't introduce any functional differences.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

2cf669a5

cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e

由 Tejun Heo 提交于 7月 15, 2014

Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them.  This is quite hairy and error-prone.  Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.

cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type.  This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.

In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes.  This patch is pure rename.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

5577964e

J
Revert "bio: modify __bio_add_page() to accept pages that don't start a new segment" · 26a33794
由 Jens Axboe 提交于 7月 14, 2014
```
This reverts commit 254c4407.

It causes crashes with cryptsetup, even after a few iterations and
updates. Drop it for now.
```
26a33794

14 7月, 2014 1 次提交

block: provide compat ioctl for BLKZEROOUT · 3b3a1814

由 Mikulas Patocka 提交于 7月 02, 2014

This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer
to two uint64_t values, so there is no need to translate it.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# 3.7+
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3b3a1814

12 7月, 2014 1 次提交

blkcg: don't call into policy draining if root_blkg is already gone · 0b462c89

由 Tejun Heo 提交于 7月 05, 2014

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bc ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NShirish Pargaonkar <spargaonkar@suse.com>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NJet Chen <jet.chen@intel.com>
Cc: stable@vger.kernel.org
Tested-by: NShirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0b462c89

09 7月, 2014 2 次提交

cgroup: remove sane_behavior support on non-default hierarchies · aa6ec29b

由 Tejun Heo 提交于 7月 09, 2014

sane_behavior has been used as a development vehicle for the default
unified hierarchy.  Now that the default hierarchy is in place, the
flag became redundant and confusing as its usage is allowed on all
hierarchies.  There are gonna be either the default hierarchy or
legacy ones.  Let's make that clear by removing sane_behavior support
on non-default hierarchies.

This patch replaces cgroup_sane_behavior() with cgroup_on_dfl().  The
comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
cgroup_on_dfl() with sane_behavior specific part dropped.

On the default and legacy hierarchies w/o sane_behavior, this
shouldn't cause any behavior differences.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>

aa6ec29b

blkcg, memcg: make blkcg depend on memcg on the default hierarchy · 1ced953b

由 Tejun Heo 提交于 7月 08, 2014

Currently, the blkio subsystem attributes all of writeback IOs to the
root.  One of the issues is that there's no way to tell who originated
a writeback IO from block layer.  Those IOs are usually issued
asynchronously from a task which didn't have anything to do with
actually generating the dirty pages.  The memory subsystem, when
enabled, already keeps track of the ownership of each dirty page and
it's desirable for blkio to piggyback instead of adding its own
per-page tag.

cgroup now has a mechanism to express such dependency -
cgroup_subsys->depends_on.  This patch declares that blkcg depends on
memcg so that memcg is enabled automatically on the default hierarchy
when available.  Future changes will make blkcg map the memcg tag to
find out the cgroup to blame for writeback IOs.

As this means that a memcg may be made invisible, this patch also
implements css_reset() for memcg which resets its basic
configurations.  This implementation will probably need to be expanded
to cover other states which are used in the default hierarchy.

v2: blkcg's dependency on memcg is wrapped with CONFIG_MEMCG to avoid
    build failure.  Reported by kbuild test robot.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>

1ced953b

08 7月, 2014 1 次提交

block: don't assume last put of shared tags is for the host · d45b3279

由 Christoph Hellwig 提交于 7月 08, 2014

There is no inherent reason why the last put of a tag structure must be
the one for the Scsi_Host, as device model objects can be held for
arbitrary periods.  Merge blk_free_tags and __blk_free_tags into a single
funtion that just release a references and get rid of the BUG() when the
host reference wasn't the last.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

d45b3279

02 7月, 2014 11 次提交

bio: modify __bio_add_page() to accept pages that don't start a new segment · 254c4407

由 Maurizio Lombardi 提交于 7月 01, 2014

The original behaviour is to refuse to add a new page if the maximum
number of segments has been reached, regardless of the fact the page we
are going to add can be merged into the last segment or not.

Unfortunately, when the system runs under heavy memory fragmentation
conditions, a driver may try to add multiple pages to the last segment.
The original code won't accept them and EBUSY will be reported to
userspace.

This patch modifies the function so it refuses to add a page only in case
the latter starts a new segment and the maximum number of segments has
already been reached.

The bug can be easily reproduced with the st driver:

1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
2) modprobe st buffer_kbs=1024
3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
   dd: error writing `/dev/st0': Device or resource busy

[ming.lei@canonical.com: update bi_iter.bi_size before recounting segments]
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Tested-by: NDongsu Park <dongsu.park@profitbricks.com>
Tested-by: NJet Chen <jet.chen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

254c4407

block SG_IO: add SG_FLAG_Q_AT_HEAD flag · d1515613

由 Douglas Gilbert 提交于 7月 01, 2014

After the SG_IO ioctl was copied into the block layer and
later into the bsg driver, subtle differences emerged.

One difference is the way injected commands are queued through
the block layer (i.e. this is not SCSI device queueing nor SATA
NCQ). Summarizing:
  - SG_IO on block layer device: blk_exec*(at_head=false)
  - sg device SG_IO: at_head=true
  - bsg device SG_IO: at_head=true

Some time ago Boaz Harrosh introduced a sg v4 flag called
BSG_FLAG_Q_AT_TAIL to override the bsg driver default. A
recent patch titled: "sg: add SG_FLAG_Q_AT_TAIL flag"
allowed the sg driver default to be overridden. This patch
allows a SG_IO ioctl sent to a block layer device to have
its default overridden.

ChangeLog:
    - introduce SG_FLAG_Q_AT_HEAD flag in sg.h to cause
      commands that are injected via a block layer
      device SG_IO ioctl to set at_head=true
    - make comments clearer about queueing in sg.h since the
      header is used both by the sg device and block layer
      device implementations of the SG_IO ioctl.
    - introduce BSG_FLAG_Q_AT_HEAD in bsg.h for compatibility
      (it does nothing) and update comments.
Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NJens Axboe <axboe@fb.com>

d1515613

block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge · 9b4231bf

由 Akinobu Mita 提交于 5月 25, 2014

SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved
buffer in bytes as int type.  The value needs to be capped at the request
queue's max_sectors.  But integer overflow is not correctly handled in
the calculation when converting max_sectors from sectors to bytes.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: linux-scsi@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b4231bf

block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX · 63f26496

由 Akinobu Mita 提交于 5月 25, 2014

BLKSECTGET ioctl loads the request queue's max_sectors as unsigned
short value to the argument pointer.  So if the max_sector is greater
than USHRT_MAX, the upper 16 bits of that is just discarded.

In such case, USHRT_MAX is more preferable than the lower 16 bits of
max_sectors.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: linux-scsi@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

63f26496

block/partitions/efi.c: kerneldoc fixing · 16e15565

由 Fabian Frederick 提交于 6月 12, 2014

Adding function documentation and fixing kerneldoc warnings
('field: description' uniformization).

Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

16e15565

block/partitions/msdos.c: code clean-up · dce14c23

由 Fabian Frederick 提交于 6月 12, 2014

checkpatch fixing:
WARNING: Missing a blank line after declarations
WARNING: space prohibited between function name and open parenthesis '('
ERROR: spaces required around that '<' (ctx:VxV)

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

dce14c23

block/partitions/amiga.c: replace nolevel printk by pr_err · 600ffc5e

由 Fabian Frederick 提交于 6月 12, 2014

Also add no prefix pr_fmt to avoid any future default format update

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

600ffc5e

block/partitions/aix.c: replace count*size kzalloc by kcalloc · 472d5e2a

由 Fabian Frederick 提交于 6月 12, 2014

kcalloc manages count*sizeof overflow.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

472d5e2a

bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload · cbcd1054

由 Gu Zheng 提交于 7月 01, 2014

Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
Martin introduces the function bip_integrity_vecs(get the useful vectors)
to fix the issue about nr_vecs for inline integrity vectors that reported
by David Milburn.

But it seems that bip_integrity_vecs() will return the wrong number if the
bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
because in that case, the bip_inline_vecs[0] is malloced directly.  So
here we add the bip_max_vcnt to record the count of vector slots, and
cleanup the function bip_integrity_vecs().
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

cbcd1054

blk-mq: use percpu_ref for mq usage count · add703fd

由 Tejun Heo 提交于 7月 01, 2014

Currently, blk-mq uses a percpu_counter to keep track of how many
usages are in flight.  The percpu_counter is drained while freezing to
ensure that no usage is left in-flight after freezing is complete.
blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
per-cpu gating mechanism.

This type of code has relatively high chance of subtle bugs which are
extremely difficult to trigger and it's way too hairy to be open coded
in blk-mq.  percpu_ref can serve the same purpose after the recent
changes.  This patch replaces the open-coded per-cpu usage counting
and draining mechanism with percpu_ref.

blk_mq_queue_enter() performs tryget_live on the ref and exit()
performs put.  blk_mq_freeze_queue() kills the ref and waits until the
reference count reaches zero.  blk_mq_unfreeze_queue() revives the ref
and wakes up the waiters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

add703fd

blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue() · 72d6f02a

由 Tejun Heo 提交于 7月 01, 2014

Keeping __blk_mq_drain_queue() as a separate function doesn't buy us
anything and it's gonna be further simplified.  Let's flatten it into
its caller.

This patch doesn't make any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

72d6f02a