提交 · 9aa61a992acceeec0d1de2cd99938421498659d5 · openanolis / cloud-kernel

05 8月, 2014 2 次提交

bcache: Fix a journal replay bug · 9aa61a99

由 Kent Overstreet 提交于 4月 10, 2014

journal replay wansn't validating pointers with bch_extent_invalid() before
derefing, fixed
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

9aa61a99

bcache: Fix a bug when detaching · 5b1016e6

由 Kent Overstreet 提交于 3月 19, 2014

After detaching a backing device from a cache set, a bit wasn't getting
reset meaning the second detach wouldn't work correctly.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>

5b1016e6

12 6月, 2014 2 次提交

dm thin: update discard_granularity to reflect the thin-pool blocksize · 09869de5

由 Lukas Czerner 提交于 6月 11, 2014

DM thinp already checks whether the discard_granularity of the data
device is a factor of the thin-pool block size.  But when using the
dm-thin-pool's discard passdown support, DM thinp was not selecting the
max of the underlying data device's discard_granularity and the
thin-pool's block size.

Update set_discard_limits() to set discard_granularity to the max of
these values.  This enables blkdev_issue_discard() to properly align the
discards that are sent to the DM thin device on a full block boundary.
As such each discard will now cover an entire DM thin-pool block and the
block will be reclaimed.
Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

09869de5

dm bio prison: implement per bucket locking in the dm_bio_prison hash table · adcc4447

由 Heinz Mauelshagen 提交于 6月 05, 2014

Split the single per bio-prison lock by using per bucket locking.  Per
bucket locking benefits both dm-thin and dm-cache targets by reducing
bio-prison lock contention.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

adcc4447

10 6月, 2014 1 次提交

raid5: speedup sync_request processing · 053f5b65

由 Eivind Sarto 提交于 6月 09, 2014

The raid5 sync_request() processing calls handle_stripe() within the context of
the resync-thread.  The resync-thread issues the first set of read requests
and this adds execution latency and slows down the scheduling of the next
sync_request().
The current rebuild/resync speed of raid5 is not much faster than what
rotational HDDs can sustain.
Testing the following patch on a 6-drive array, I can increase the rebuild
speed from 100 MB/s to 175 MB/s.
The sync_request() now just sets STRIPE_HANDLE and releases the stripe.  This
creates some more parallelism between the resync-thread and raid5 kernel daemon.
Signed-off-by: NEivind Sarto <esarto@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

053f5b65

05 6月, 2014 1 次提交

md/raid5: deadlock between retry_aligned_read with barrier io · 2844dc32

由 hui jiao 提交于 6月 05, 2014

A chunk aligned read increases counter active_aligned_reads and
decreases it after sub-device handle it successfully. But when a read
error occurs,  the read redispatched by raid5d, and the
active_aligned_reads will not be decreased until we can grab a stripe
head in retry_aligned_read. Now suppose, a barrier io comes, set
conf->quiesce to 2, and wait until both active_stripes and
active_aligned_reads are zero. The retried chunk aligned read gets
stuck at get_active_stripe waiting until conf->quiesce becomes 0.
Retry_aligned_read and barrier io are waiting each other now.
One possible solution is that we ignore conf->quiesce, let the retried
aligned read finish. I reproduced this deadlock and test this patch on
centos6.0
Signed-off-by: NNeilBrown <neilb@suse.de>

2844dc32

04 6月, 2014 9 次提交

dm: remove symbol export for dm_set_device_limits · 11f0431b

由 Mike Snitzer 提交于 6月 03, 2014

There is no need for code other than DM core to use dm_set_device_limits
so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

11f0431b

dm: disable WRITE SAME if it fails · 7eee4ae2

由 Mike Snitzer 提交于 6月 02, 2014

Add DM core support for disabling WRITE SAME on first failure to both
request-based and bio-based targets. The need to disable WRITE SAME
stems from SCSI enabling it by default but then disabling it when it
fails. When SCSI does this it returns "permanent target failure, do
not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME
on failure if the returned error is -EREMOTEIO.

Commit f84cb8a4 ("dm mpath: disable WRITE SAME if it fails")
implemented multipath specific disabling of WRITE SAME if it fails.
However, as that commit detailed, the multipath-only solution doesn't go
far enough if bio-based DM targets are stacked ontop of the
request-based dm-multipath target (as is commonly done using dm-linear
to support partitions on multipath devices, via kpartx).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Tested-by: NAlex Chen <alex.chen@huawei.com>

7eee4ae2

dm era: check for a non-NULL metadata object before closing it · 989f26f5

由 Joe Thornber 提交于 3月 11, 2014

era_ctr() may call era_destroy() before era->md is initialized so
era_destory() must only close the metadata object if it is not NULL.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NNaohiro Aota <naota@elisp.net>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.15+

989f26f5

dm thin: return ENOSPC instead of EIO when error_if_no_space enabled · af91805a

由 Mike Snitzer 提交于 5月 22, 2014

Update the DM thin provisioning target's allocation failure error to be
consistent with commit a9d6ceb8 ("[SCSI] return ENOSPC on thin
provisioning failure").

The DM thin target now returns -ENOSPC rather than -EIO when
block allocation fails due to the pool being out of data space (and
the 'error_if_no_space' thin-pool feature is enabled).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-By: NJoe Thornber <ejt@redhat.com>

af91805a

dm thin: cleanup noflush_work to use a proper completion · e7a3e871

由 Joe Thornber 提交于 5月 13, 2014

Factor out a pool_work interface that noflush_work makes use of to wait
for and complete work items (in terms of a proper completion struct).
Allows discontinuing the use of a custom completion in terms of atomic_t
and wait_event.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e7a3e871

dm snapshot: do not split read bios sent to snapshot-origin target · 298eaa89

由 Mikulas Patocka 提交于 3月 14, 2014

Change the snapshot-origin target so that only write bios are split on
chunk boundary.  Read bios are passed unchanged to the underlying
device, so they don't have to be split.

Later, we could change the target so that it accepts a larger write bio
if it spans an area that is completely covered by snapshot exceptions.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

298eaa89

dm snapshot: allocate a per-target structure for snapshot-origin target · 599cdf3b

由 Mikulas Patocka 提交于 3月 14, 2014

Allocate a per-target dm_origin structure.  This is a prerequisite for
the next commit ("dm snapshot: do not split read bios sent to
snapshot-origin target") which adds a new member to this structure.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

599cdf3b

dm: introduce dm_accept_partial_bio · 1dd40c3e

由 Mikulas Patocka 提交于 3月 14, 2014

The function dm_accept_partial_bio allows the target to specify how many
sectors of the current bio it will process.  If the target only wants to
accept part of the bio, it calls dm_accept_partial_bio and the DM core
sends the rest of the data in next bio.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1dd40c3e

dm: change sector_count member in clone_info from sector_t to unsigned · e0d6609a

由 Mikulas Patocka 提交于 3月 14, 2014

It is impossible to create bios with 2^23 or more sectors (the size is
stored as a 32-bit byte count in the bio). So we convert some sector_t
values to unsigned integers.

This is needed for the next commit ("dm: introduce
dm_accept_partial_bio") that replaces integer value arguments with
pointers, so the size of the integer must match.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e0d6609a

29 5月, 2014 7 次提交

raid5: add an option to avoid copy data from bio to stripe cache · d592a996

由 Shaohua Li 提交于 5月 21, 2014

The stripe cache has two goals:
1. cache data, so next time if data can be found in stripe cache, disk access
can be avoided.
2. stable data. data is copied from bio to stripe cache and calculated parity.
data written to disk is from stripe cache, so if upper layer changes bio data,
data written to disk isn't impacted.

In my environment, I can guarantee 2 will not happen. And BDI_CAP_STABLE_WRITES
can guarantee 2 too. For 1, it's not common too. block plug mechanism will
dispatch a bunch of sequentail small requests together. And since I'm using
SSD, I'm using small chunk size. It's rare case stripe cache is really useful.

So I'd like to avoid the copy from bio to stripe cache and it's very helpful
for performance. In my 1M randwrite tests, avoid the copy can increase the
performance more than 30%.

Of course, this shouldn't be enabled by default. It's reported enabling
BDI_CAP_STABLE_WRITES can harm some workloads before, so I added an option to
control it.

Neilb:
  changed BUG_ON to WARN_ON
  Removed some assignments from raid5_build_block which are now not needed.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d592a996

md/bitmap: remove confusing code from filemap_get_page. · f2e06c58

由 NeilBrown 提交于 5月 28, 2014

file_page_index(store, 0) is *always* 0.
This is because the bitmap sb, at 256 bytes, is *always* less than
one page.
So subtracting it has no effect and the code should be removed.
Reported-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

f2e06c58

raid5: avoid release list until last reference of the stripe · cf170f3f

由 Eivind Sarto 提交于 5月 28, 2014

The (lockless) release_list reduces lock contention, but there is excessive
queueing and dequeuing of stripes on this list. A stripe will currently be
queued on the release_list with a stripe reference count > 1. This can cause
the raid5 kernel thread(s) to dequeue the stripe and decrement the refcount
without doing any other useful processing of the stripe. The are two cases
when the stripe can be put on the release_list multiple times before it is
actually handled by the kernel thread(s).
1) make_request() activates the stripe processing in 4k increments. When a
write request is large enough to span multiple chunks of a stripe_head, the
first 4k chunk adds the stripe to the plug list. The next 4k chunk that is
processed for the same stripe puts the stripe on the release_list with a
refcount=2. This can cause the kernel thread to process and decrement the
stripe before the stripe us unplugged, which again will put it back on the
release_list.
2) Whenever IO is scheduled on a stripe (pre-read and/or write), the stripe
refcount is set to the number of active IO (for each chunk). The stripe is
released as each IO complete, and can be queued and dequeued multiple times
on the release_list, until its refcount finally reached zero.

This simple patch will ensure a stripe is only queued on the release_list when
its refcount=1 and is ready to be handled by the kernel thread(s). I added some
instrumentation to raid5 and counted the number of times striped were queued on
the release_list for a variety of write IO sizes. Without this patch the number
of times stripes got queued on the release_list was 100-500% higher than with
the patch. The excess queuing will increase with the IO size. The patch also
improved throughput by 5-10%.
Signed-off-by: NEivind Sarto <esarto@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cf170f3f

md: md_clear_badblocks should return an error code on failure. · 8b32bf5e

由 NeilBrown 提交于 5月 28, 2014

Julia Lawall and coccinelle report that md_clear_badblocks always
returns 0, despite appearing to have an error path.
The error path really should return an error code.  ENOSPC is
reasonably appropriate.
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NNeilBrown <neilb@suse.de>

8b32bf5e

md/raid56: Don't perform reads to support writes until stripe is ready. · 67f45548

由 NeilBrown 提交于 5月 28, 2014

If it is found that we need to pre-read some blocks before a write
can succeed, we normally set STRIPE_DELAYED and don't actually perform
the read until STRIPE_PREREAD_ACTIVE subsequently gets set.

However for a degraded RAID6 we currently perform the reads as soon
as we see that a write is pending.  This significantly hurts
throughput.

So:
 - when handle_stripe_dirtying find a block that it wants on a device
   that is failed, set STRIPE_DELAY, instead of doing nothing, and
 - when fetch_block detects that a read might be required to satisfy a
   write, only perform the read if STRIPE_PREREAD_ACTIVE is set,
   and if we would actually need to read something to complete the write.

This also helps RAID5, though less often as RAID5 supports a
read-modify-write cycle.  For RAID5 the read is performed too early
only if the write is not a full 4K aligned write (i.e. no an
R5_OVERWRITE).

Also clean up a couple of horrible bits of formatting.
Reported-by: NPatrik Horník <patrik@dsl.sk>
Signed-off-by: NNeilBrown <neilb@suse.de>

67f45548

md: refuse to change shape of array if it is active but read-only · bd8839e0

由 NeilBrown 提交于 5月 28, 2014

read-only arrays should not be changed.  This includes changing
the level, layout, size, or number of devices.

So reject those changes for readonly arrays.
Signed-off-by: NNeilBrown <neilb@suse.de>

bd8839e0

md: always set MD_RECOVERY_INTR when interrupting a reshape thread. · 2ac295a5

由 NeilBrown 提交于 5月 29, 2014

Commit 8313b8e5
   md: fix problem when adding device to read-only array with bitmap.

added a called to md_reap_sync_thread() which cause a reshape thread
to be interrupted (in particular, it could cause md_thread() to never even
call md_do_sync()).
However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not
know that the reshape didn't complete.

This only happens when mddev->ro is set and normally reshape threads
don't run in that situation.  But raid5 and raid10 can start a reshape
thread during "run" is the array is in the middle of a reshape.
They do this even if ->ro is set.

So it is best to set MD_RECOVERY_INTR before abortingg the
sync thread, just in case.

Though it rare for this to trigger a problem it can cause data corruption
because the reshape isn't finished properly.
So it is suitable for any stable which the offending commit was applied to.
(3.2 or later)

Fixes: 8313b8e5
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NNeilBrown <neilb@suse.de>

2ac295a5

28 5月, 2014 1 次提交

md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync". · 3991b31e

由 NeilBrown 提交于 5月 28, 2014

If mddev->ro is set, md_to_sync will (correctly) abort.
However in that case MD_RECOVERY_INTR isn't set.

If a RESHAPE had been requested, then ->finish_reshape() will be
called and it will think the reshape was successful even though
nothing happened.

Normally a resync will not be requested if ->ro is set, but if an
array is stopped while a reshape is on-going, then when the array is
started, the reshape will be restarted.  If the array is also set
read-only at this point, the reshape will instantly appear to success,
resulting in data corruption.

Consequently, this patch is suitable for any -stable kernel.

Cc: stable@vger.kernel.org (any)
Signed-off-by: NNeilBrown <neilb@suse.de>

3991b31e

27 5月, 2014 2 次提交

dm mpath: really fix lockdep warning · 63d832c3

由 Hannes Reinecke 提交于 5月 26, 2014

lockdep complains about a circular locking.  And indeed, we need to
release the lock before calling dm_table_run_md_queue_async().

As such, commit 4cdd2ad7 ("dm mpath: fix lock order inconsistency in
multipath_ioctl") must also be reverted in addition to fixing the
lock order in the other dm_table_run_md_queue_async() callers.
Reported-by: NBart van Assche <bvanassche@acm.org>
Tested-by: NBart van Assche <bvanassche@acm.org>
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

63d832c3

dm cache: always split discards on cache block boundaries · f1daa838

由 Heinz Mauelshagen 提交于 5月 23, 2014

The DM cache target cannot cope with discards that span multiple cache
blocks, so each discard bio that spans more than one cache block must
get split by the DM core.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.9+

f1daa838

21 5月, 2014 1 次提交

dm thin: add 'no_space_timeout' dm-thin-pool module param · 80c57893

由 Mike Snitzer 提交于 5月 20, 2014

Commit 85ad643b ("dm thin: add timeout to stop out-of-data-space mode
holding IO forever") introduced a fixed 60 second timeout.  Users may
want to either disable or modify this timeout.

Allow the out-of-data-space timeout to be configured using the
'no_space_timeout' dm-thin-pool module param.  Setting it to 0 will
disable the timeout, resulting in IO being queued until more data space
is added to the thin-pool.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

80c57893

15 5月, 2014 4 次提交

dm mpath: fix lock order inconsistency in multipath_ioctl · 4cdd2ad7

由 Mike Snitzer 提交于 5月 13, 2014

Commit 3e9f1be1 ("dm mpath: remove process_queued_ios()") did not
consistently take the multipath device's spinlock (m->lock) before
calling dm_table_run_md_queue_async() -- which takes the q->queue_lock.

Found with code inspection using hint from reported lockdep warning.
Reported-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4cdd2ad7

dm thin: add timeout to stop out-of-data-space mode holding IO forever · 85ad643b

由 Joe Thornber 提交于 5月 09, 2014

If the pool runs out of data space, dm-thin can be configured to
either error IOs that would trigger provisioning, or hold those IOs
until the pool is resized.  Unfortunately, holding IOs until the pool is
resized can result in a cascade of tasks hitting the hung_task_timeout,
which may render the system unavailable.

Add a fixed timeout so IOs can only be held for a maximum of 60 seconds.
If LVM is going to resize a thin-pool that is out of data space it needs
to be prompt about it.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

85ad643b

dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode · 8d07e8a5

由 Joe Thornber 提交于 5月 06, 2014

Commit 3e1a0699 ("dm thin: fix out of data space handling") introduced
a regression in the metadata commit() method by returning an error if
the pool is in PM_OUT_OF_DATA_SPACE mode.  This oversight caused a thin
device to return errors even if the default queue_if_no_space ENOSPC
handling mode is used.

Fix commit() to only fail if pool is in PM_READ_ONLY or PM_FAIL mode.

Reported-by: qindehua@163.com
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

8d07e8a5

dm crypt: fix cpu hotplug crash by removing per-cpu structure · 610f2de3

由 Mikulas Patocka 提交于 2月 20, 2014

The DM crypt target used per-cpu structures to hold pointers to a
ablkcipher_request structure.  The code assumed that the work item keeps
executing on a single CPU, so it didn't use synchronization when
accessing this structure.

If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online,
the work item could be moved to another CPU.  This causes dm-crypt
crashes, like the following, because the code starts using an incorrect
ablkcipher_request:

 smpboot: CPU 7 is now offline
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000130
 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt]
 ...
 Call Trace:
  [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt]
  [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0
  [<ffffffff81052a28>] ? process_one_work+0x168/0x470
  [<ffffffff8105366b>] ? worker_thread+0x10b/0x390
  [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290
  [<ffffffff81058d9f>] ? kthread+0xaf/0xc0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120
  [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120

Fix this bug by removing the per-cpu definition.  The structure
ablkcipher_request is accessed via a pointer from convert_context.
Consequently, if the work item is rescheduled to a different CPU, the
thread still uses the same ablkcipher_request.

This change may undermine performance improvements intended by commit
c0297721 ("dm crypt: scale to multiple cpus") on select hardware.  In
practice no performance difference was observed on recent hardware.  But
regardless, correctness is more important than performance.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

610f2de3

06 5月, 2014 2 次提交

md: avoid possible spinning md thread at shutdown. · 0f62fb22

由 NeilBrown 提交于 5月 06, 2014

If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:

1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
   and md_check_recover enters an infinite loop.

Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.

Cc: stable@vger.kernel.org (any kernel)
Signed-off-by: NNeilBrown <neilb@suse.de>

0f62fb22

md/raid10: call wait_barrier() for each request submitted. · cc13b1d1

由 NeilBrown 提交于 5月 05, 2014

wait_barrier() includes a counter, so we must call it precisely once
(unless balanced by allow_barrier()) for each request submitted.

Since
commit 20d0189b
    block: Introduce new bio_split()
in 3.14-rc1, we don't call it for the extra requests generated when
we need to split a bio.

When this happens the counter goes negative, any resync/recovery will
never start, and  "mdadm --stop" will hang.
Reported-by: NChris Murphy <lists@colorremedies.com>
Fixes: 20d0189b
Cc: stable@vger.kernel.org (3.14+)
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cc13b1d1

02 5月, 2014 1 次提交

dm cache: fix writethrough mode quiescing in cache_map · 131cd131

由 Mike Snitzer 提交于 5月 01, 2014

Commit 2ee57d58 ("dm cache: add passthrough mode") inadvertently
removed the deferred set reference that was taken in cache_map()'s
writethrough mode support.  Restore taking this reference.

This issue was found with code inspection.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # 3.13+

131cd131

29 4月, 2014 1 次提交

dm thin: use INIT_WORK_ONSTACK in noflush_work to avoid ODEBUG warning · fbcde3d8

由 Mike Snitzer 提交于 4月 29, 2014

Use INIT_WORK_ONSTACK to silence "ODEBUG: object is on stack, but not
annotated".
Reported-by: NZdeněk Kabeláč <zkabelac@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

fbcde3d8

18 4月, 2014 1 次提交

arch: Mass conversion of smp_mb__*() · 4e857c58

由 Peter Zijlstra 提交于 3月 17, 2014

Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4e857c58

17 4月, 2014 1 次提交

raid5: fix a race of stripe count check · c7a6d35e

由 Shaohua Li 提交于 4月 15, 2014

I hit another BUG_ON with e240c183. In __get_priority_stripe(),
stripe count equals to 0 initially. Between atomic_inc and BUG_ON,
get_active_stripe() finds the stripe. So the stripe count isn't 1 any more.

V2: keeps the BUG_ON suggested by Neil.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c7a6d35e

16 4月, 2014 2 次提交

block: remove struct request buffer member · b4f42e28

由 Jens Axboe 提交于 4月 10, 2014

This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.
Signed-off-by: NJens Axboe <axboe@fb.com>

b4f42e28

dm verity: fix biovecs hash calculation regression · 3a774521

由 Milan Broz 提交于 4月 14, 2014

Commit 003b5c57 ("block: Convert drivers
to immutable biovecs") incorrectly converted biovec iteration in
dm-verity to always calculate the hash from a full biovec, but the
function only needs to calculate the hash from part of the biovec (up to
the calculated "todo" value).

Fix this issue by limiting hash input to only the requested data size.

This problem was identified using the cryptsetup regression test for
veritysetup (verity-compat-test).
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+

3a774521

09 4月, 2014 2 次提交

raid5: get_active_stripe avoids device_lock · e240c183

由 Shaohua Li 提交于 4月 09, 2014

For sequential workload (or request size big workload), get_active_stripe can
find cached stripe. In this case, we always hold device_lock, which exposes a
lot of lock contention for such workload. If stripe count isn't 0, we don't
need hold the lock actually, since we just increase its count. And this is the
hot code path for such workload. Unfortunately we must delete the BUG_ON.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

e240c183

raid5: make_request does less prepare wait · 27c0f68f

由 Shaohua Li 提交于 4月 09, 2014

In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a
lot of contention for sequential workload (or big request size
workload). For such workload, each bio includes several stripes. So we
can just do prepare_to_wait/finish_wait once for the whold bio instead
of every stripe.  This reduces the lock contention completely for such
workload. Random workload might have the similar lock contention too,
but I didn't see it yet, maybe because my stroage is still not fast
enough.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

27c0f68f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功