提交 · b39685526f46976bcd13aa08c82480092befa46c · openanolis / cloud-kernel

19 8月, 2014 2 次提交

md/raid10: Fix memory leak when raid10 reshape completes. · b3968552

由 NeilBrown 提交于 8月 18, 2014

When a raid10 commences a resync/recovery/reshape it allocates
some buffer space.
When a resync/recovery completes the buffer space is freed.  But not
when the reshape completes.
This can result in a small memory leak.

There is a subtle side-effect of this bug.  When a RAID10 is reshaped
to a larger array (more devices), the reshape is immediately followed
by a "resync" of the new space.  This "resync" will use the buffer
space which was allocated for "reshape".  This can cause problems
including a "BUG" in the SCSI layer.  So this is suitable for -stable.

Cc: stable@vger.kernel.org (v3.5+)
Fixes: 3ea7daa5Signed-off-by: NNeilBrown <neilb@suse.de>

b3968552

md/raid10: fix memory leak when reshaping a RAID10. · ce0b0a46

由 NeilBrown 提交于 8月 18, 2014

raid10 reshape clears unwanted bits from a bio->bi_flags using
a method which, while clumsy, worked until 3.10 when BIO_OWNS_VEC
was added.
Since then it clears that bit but shouldn't.  This results in a
memory leak.

So change to used the approved method of clearing unwanted bits.

As this causes a memory leak which can consume all of memory
the fix is suitable for -stable.

Fixes: a38352e0
Cc: stable@vger.kernel.org (v3.10+)
Reported-by: mdraid.pkoch@dfgh.net (Peter Koch)
Signed-off-by: NNeilBrown <neilb@suse.de>

ce0b0a46

18 8月, 2014 2 次提交

md/raid6: avoid data corruption during recovery of double-degraded RAID6 · 9c4bdf69

由 NeilBrown 提交于 8月 13, 2014

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Cc: stable@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0
Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: N"Manibalan P" <pmanibalan@amiindia.co.in>
Tested-by: N"Manibalan P" <pmanibalan@amiindia.co.in>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NDan Williams <dan.j.williams@intel.com>

9c4bdf69

md/raid5: avoid livelock caused by non-aligned writes. · a40687ff

由 NeilBrown 提交于 8月 13, 2014

If a stripe in a raid6 array received a write to each data block while
the array is degraded, and if any of these writes to a missing device
are not page-aligned, then a live-lock happens.

In this case the P and Q blocks need to be read so that the part of
the missing block which is *not* being updated by the write can be
constructed.  Due to a logic error, these blocks are not loaded, so
the update cannot proceed and the stripe is 'handled' repeatedly in an
infinite loop.

This bug is unlikely as most writes are page aligned.  However as it
can lead to a livelock it is suitable for -stable.  It was introduced
in 3.16.

Cc: stable@vger.kernel.org (v3.16)
Fixed: 67f45548Signed-off-by: NNeilBrown <neilb@suse.de>

a40687ff

11 8月, 2014 1 次提交

dm table: propagate QUEUE_FLAG_NO_SG_MERGE · 200612ec

由 Jeff Moyer 提交于 8月 08, 2014

Commit 05f1dd53 ("block: add queue flag for disabling SG merging")
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio->bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.16+

200612ec

08 8月, 2014 3 次提交

md: don't allow bitmap file to be added to raid0/linear. · d66b1b39

由 NeilBrown 提交于 8月 08, 2014

An array can only accept a bitmap if it will call bitmap_daemon_work
periodically, which means it needs a thread running.

If there is no thread, don't allow a bitmap to be added.
Signed-off-by: NNeilBrown <neilb@suse.de>

d66b1b39

md/raid0: check for bitmap compatability when changing raid levels. · a8461a61

由 NeilBrown 提交于 8月 06, 2014

If an array has a bitmap, then it cannot be converted to raid0.
Reported-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a8461a61

md: Recovery speed is wrong · ac7e50a3

由 Xiao Ni 提交于 8月 07, 2014

When we calculate the speed of recovery, the numerator that contains
the recovery done sectors.  It's need to subtract the sectors which
don't finish recovery.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ac7e50a3

05 8月, 2014 22 次提交

bcache: Drop unneeded blk_sync_queue() calls · 0781c874

由 Kent Overstreet 提交于 7月 07, 2014

this is needed for the queue/block device we created (it's done by
blk_cleanup_queue() which we do call) - but calling it for the block devices we
only opened is pointless.

Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2

0781c874

bcache: add mutex lock for bch_is_open · 789d21db

由 Jianjian Huo 提交于 7月 13, 2014

Since bch_is_open will iterate linked list bch_cache_sets and
uncached_devices, it needs bch_register_lock.
Signed-off-by: NJianjian Huo <samuel.huo@gmail.com>

789d21db

bcache: Correct printing of btree_gc_max_duration_ms · 5b25abad

由 Surbhi Palande 提交于 4月 17, 2014

time_stats::btree_gc_max_duration_mc is not bit shifted by 8

Fixes BUG #138

Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a
Signed-off-by: NSurbhi Palande <sap@daterainc.com>

5b25abad

bcache: try to set b->parent properly · 2452cc89

由 Slava Pestov 提交于 7月 12, 2014

bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.

Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1

2452cc89

bcache: fix memory corruption in init error path · c9a78332

由 Slava Pestov 提交于 6月 19, 2014

If register_cache_set() failed, we would touch ca->set after
it had already been freed. Also, fix an assertion to catch
this.

Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d

c9a78332

S
bcache: fix crash with incomplete cache set · bf0c55c9
由 Slava Pestov 提交于 7月 11, 2014
```
Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
```
bf0c55c9
K
bcache: Fix more early shutdown bugs · d83353b3
由 Kent Overstreet 提交于 6月 11, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
d83353b3

bcache: fix use-after-free in btree_gc_coalesce() · 400ffaa2

由 Slava Pestov 提交于 7月 12, 2014

If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.

This regression was introduced in 2d7f9531.

Change-Id: I76564d7257800583214376b4bacf236cda90c89c

400ffaa2

bcache: Fix an infinite loop in journal replay · 6b708de6

由 Kent Overstreet 提交于 6月 02, 2014

When running with multiple cache devices, if one of the devices has a completely
empty journal but we'd already found some journal entries on a previosu device
we'd go into an infinite loop.

Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

6b708de6

S
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint · 913dc33f
由 Slava Pestov 提交于 5月 23, 2014
```
'b' was NULL.

Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
```
913dc33f
S
bcache: bcache_write tracepoint was crashing · 60ae81ee
由 Slava Pestov 提交于 5月 22, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
60ae81ee
S
bcache: fix typo in bch_bkey_equal_header · 8e094808
由 Slava Pestov 提交于 6月 30, 2014
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
8e094808

bcache: Allocate bounce buffers with GFP_NOWAIT · 501d52a9

由 Kent Overstreet 提交于 5月 19, 2014

There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.

Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c

501d52a9

bcache: Make sure to pass GFP_WAIT to mempool_alloc() · bcf090e0

由 Kent Overstreet 提交于 5月 19, 2014

this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

bcf090e0

bcache: fix uninterruptible sleep in writeback thread · 9e5c3535

由 Slava Pestov 提交于 5月 01, 2014

There were two issues here:

- writeback thread did not start until the device first became dirty
- writeback thread used uninterruptible sleep once running

Without this patch I see kernel warnings printed and a load average of
1.52 after booting my test VM. With this patch the warnings are gone and
the load average is near 0.00 as expected.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9e5c3535

bcache: wait for buckets when allocating new btree root · c5aa4a31

由 Slava Pestov 提交于 4月 21, 2014

Tested:
- sometimes bcache_tier test would hang on startup with a failure
  to allocate the btree root -- no longer seeing this
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

c5aa4a31

S
bcache: fix crash on shutdown in passthrough mode · a664d0f0
由 Slava Pestov 提交于 5月 20, 2014
```
We never started the writeback thread in this case, so don't stop it.
```
a664d0f0
S

bcache: fix lockdep warnings on shutdown · e5112201
由 Slava Pestov 提交于 4月 29, 2014

e5112201
S

bcache allocator: send discards with correct size · 8b326d3a
由 Slava Pestov 提交于 4月 21, 2014

8b326d3a

bcache: Fix to remove the rcu_sched stalls. · dbd810ab

由 Surbhi Palande 提交于 4月 10, 2014

while loop was executing infinitely.
This fix ends the while loop gracefully.
Signed-off-by: NSurbhi Palande <sap@daterainc.com>
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

dbd810ab

bcache: Fix a journal replay bug · 9aa61a99

由 Kent Overstreet 提交于 4月 10, 2014

journal replay wansn't validating pointers with bch_extent_invalid() before
derefing, fixed
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9aa61a99

bcache: Fix a bug when detaching · 5b1016e6

由 Kent Overstreet 提交于 3月 19, 2014

After detaching a backing device from a cache set, a bit wasn't getting
reset meaning the second detach wouldn't work correctly.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5b1016e6

02 8月, 2014 10 次提交

dm switch: efficiently support repetitive patterns · 56b1ebf2

由 Mikulas Patocka 提交于 7月 28, 2014

Add support for quickly loading a repetitive pattern into the
dm-switch target.

In the "set_regions_mappings" message, the user may now use "Rn,m" as
one of the arguments.  "n" and "m" are hexadecimal numbers.  The "Rn,m"
argument repeats the last "n" arguments in the following "m" slots.

For example:
dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
is equivalent to
dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
Requested-by: NJay Wang <jwang@nimblestorage.com>
Tested-by: NJay Wang <jwang@nimblestorage.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

56b1ebf2

dm switch: factor out switch_region_table_read · 99eb1908

由 Mikulas Patocka 提交于 7月 28, 2014

Move code that reads the table to a switch_region_table_read.
It will be needed for the next commit.  No functional change.
Tested-by: NJay Wang <jwang@nimblestorage.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

99eb1908

dm cache: set minimum_io_size to cache's data block size · b0246530

由 Mike Snitzer 提交于 7月 19, 2014

Before, if the block layer's limit stacking didn't establish an
optimal_io_size that was compatible with the cache's data block size
we'd set optimal_io_size to the data block size and minimum_io_size to 0
(which the block layer adjusts to be physical_block_size).

Update cache_io_hints() to set both minimum_io_size and optimal_io_size
to the cache's data block size.  This fixes an issue where mkfs.xfs
would create more XFS Allocation Groups on cache volumes than on a
normal linear LV of comparable size.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b0246530

dm thin: set minimum_io_size to pool's data block size · fdfb4c8c

由 Mike Snitzer 提交于 7月 18, 2014

Before, if the block layer's limit stacking didn't establish an
optimal_io_size that was compatible with the thin-pool's data block size
we'd set optimal_io_size to the data block size and minimum_io_size to 0
(which the block layer adjusts to be physical_block_size).

Update pool_io_hints() to set both minimum_io_size and optimal_io_size
to the thin-pool's data block size.  This fixes an issue reported where
mkfs.xfs would create more XFS Allocation Groups on thinp volumes than
on a normal linear LV of comparable size, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1003227Reported-by: NChris Murphy <lists@colorremedies.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fdfb4c8c

dm crypt: use per-bio data · 298a9fa0

由 Mikulas Patocka 提交于 3月 28, 2014

Change dm-crypt so that it uses auxiliary data allocated with the bio.

Dm-crypt requires two allocations per request - struct dm_crypt_io and
struct ablkcipher_request (with other data appended to it).  It
previously only used mempool allocations.

Some requests may require more dm_crypt_ios and ablkcipher_requests,
however most requests need just one of each of these two structures to
complete.

This patch changes it so that the first dm_crypt_io and ablkcipher_request
are allocated with the bio (using target per_bio_data_size option).  If
the request needs additional values, they are allocated from the mempool.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

298a9fa0

dm table: make dm_table_supports_discards static · a7ffb6a5

由 Mikulas Patocka 提交于 7月 10, 2014

The function dm_table_supports_discards is only called from
dm-table.c:dm_table_set_restrictions().  So move it above
dm_table_set_restrictions and make it static.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a7ffb6a5

dm cache metadata: use dm-space-map-metadata.h defined size limits · 895b47d7

由 Mike Snitzer 提交于 7月 14, 2014

Commit 7d48935e cleaned up the persistent-data's space-map-metadata
limits by elevating them to dm-space-map-metadata.h.  Update
dm-cache-metadata to use these same limits.

The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the
sizeof the disk_bitmap_header.  So the supported maximum metadata size
is a bit smaller (reduced from 33423360 to 33292800 sectors).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

895b47d7

J
dm cache: fail migrations in the do_worker error path · 304affaa
由 Joe Thornber 提交于 6月 24, 2014
```
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
304affaa

dm cache: simplify deferred set reference count increments · 8c081b52

由 Joe Thornber 提交于 5月 13, 2014

Factor out inc_and_issue and inc_ds helpers to simplify deferred set
reference count increments.  Also cleanup cache_map to consistently call
cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED.

No functional change.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8c081b52

dm thin: relax external origin size constraints · e5aea7b4

由 Joe Thornber 提交于 6月 13, 2014

Track the size of any external origin.  Previously the external origin's
size had to be a multiple of the thin-pool's block size, that is no
longer a requirement.  In addition, snapshots that are larger than the
external origin are now supported.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5aea7b4

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功