提交 · 6119e6792bcaf926cb284098042a576c1a55b513 · openanolis / cloud-kernel

10 11月, 2016 3 次提交

md: remove md_super_wait() call after bitmap_flush() · 6119e679

由 NeilBrown 提交于 11月 09, 2016

bitmap_flush() finishes with bitmap_update_sb(), and that finishes
with write_page(..., 1), so write_page() will wait for all writes
to complete.  So there is no point calling md_super_wait()
immediately afterwards.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6119e679

md: define mddev flags, recovery flags and r1bio state bits using enums · be306c29

由 NeilBrown 提交于 11月 09, 2016

This is less error prone than using individual #defines.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

be306c29

md/raid1: fix: IO can block resync indefinitely · f2c771a6

由 NeilBrown 提交于 11月 09, 2016

While performing a resync/recovery, raid1 divides the
array space into three regions:
 - before the resync
 - at or shortly after the resync point
 - much further ahead of the resync point.

Write requests to the first or third do not need to wait.  Write
requests to the middle region do need to wait if resync requests are
pending.

If there are any active write requests in the middle region, resync
will wait for them.

Due to an accounting error, there is a small range of addresses,
between conf->next_resync and conf->start_next_window, where write
requests will *not* be blocked, but *will* be counted in the middle
region.  This can effectively block resync indefinitely if filesystem
writes happen repeatedly to this region.

As ->next_window_requests is incremented when the sector is after
  conf->start_next_window + NEXT_NORMALIO_DISTANCE
the same boundary should be used for determining when write requests
should wait.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f2c771a6

08 11月, 2016 19 次提交

md/bitmap: Don't write bitmap while earlier writes might be in-flight · 85c9ccd4

由 NeilBrown 提交于 11月 04, 2016

As we don't wait for writes to complete in bitmap_daemon_work, they
could still be in-flight when bitmap_unplug writes again.  Or when
bitmap_daemon_work tries to write again.
This can be confusing and could risk the wrong data being written last.

So make sure we wait for old writes to complete before new writes start.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

85c9ccd4

md/raid10: abort delayed writes when device fails. · a9ae93c8

由 NeilBrown 提交于 11月 04, 2016

When writing to an array with a bitmap enabled, the writes are grouped
in batches which are preceded by an update to the bitmap.

It is quite likely if that a drive develops a problem which is not
media related, that the bitmap write will be the first to report an
error and cause the device to be marked faulty (as the bitmap write is
at the start of a batch).

In this case, there is point submiting the subsequent writes to the
failed device - that just wastes times.

So re-check the Faulty state of a device before submitting a
delayed write.

This requires that we keep the 'rdev', rather than the 'bdev' in the
bio, then swap in the bdev just before final submission.
Reported-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a9ae93c8

md/raid1: abort delayed writes when device fails. · 5e2c7a36

由 NeilBrown 提交于 11月 04, 2016

When writing to an array with a bitmap enabled, the writes are grouped
in batches which are preceded by an update to the bitmap.

It is quite likely if that a drive develops a problem which is not
media related, that the bitmap write will be the first to report an
error and cause the device to be marked faulty (as the bitmap write is
at the start of a batch).

In this case, there is point submiting the subsequent writes to the
failed device - that just wastes times.

So re-check the Faulty state of a device before submitting a
delayed write.

This requires that we keep the 'rdev', rather than the 'bdev' in the
bio, then swap in the bdev just before final submission.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5e2c7a36

md: perform async updates for metadata where possible. · 060b0689

由 NeilBrown 提交于 11月 04, 2016

When adding devices to, or removing device from, an array we need to
update the metadata.  However we don't need to do it synchronously as
data integrity doesn't depend on these changes being recorded
instantly.  So avoid the synchronous call to md_update_sb and just set
a flag so that the thread will do it.

This can reduce the number of updates performed when lots of devices
are being added or removed.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

060b0689

raid5-cache: restrict the use area of the log_offset variable · 3fd880af

由 JackieLiu 提交于 11月 02, 2016

We can calculate this offset by using ctx->meta_total_blocks,
without passing in from the function
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

3fd880af

N
md/raid5: change printk() to pr_*() · cc6167b4
由 NeilBrown 提交于 11月 02, 2016
```
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
cc6167b4
N
md/raid10: change printk() to pr_*() · 08464e09
由 NeilBrown 提交于 11月 02, 2016
```
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
08464e09
N
md/raid1: change printk() to pr_*() · 1d41c216
由 NeilBrown 提交于 11月 02, 2016
```
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
1d41c216

md/raid0: replace printk() with pr_*() · 76603884

由 NeilBrown 提交于 11月 02, 2016

This makes md/raid0 much less verbose as the messages about
the array geometry are now pr_debug()
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

76603884

md/multipath: replace printk() with pr_*() · 7279694d

由 NeilBrown 提交于 11月 02, 2016

Also remove all messages about memory allocation failure.
page_alloc() reports those.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7279694d

N
md/linear: replace printk() with pr_*() · a2e202af
由 NeilBrown 提交于 11月 02, 2016
```
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
a2e202af

md/bitmap: change all printk() to pr_*() · ec0cc226

由 NeilBrown 提交于 11月 02, 2016

Follow err/warn distinction introduced in md.c
Join multi-part strings into single string.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ec0cc226

md: change all printk() to pr_err() or pr_warn() etc. · 9d48739e

由 NeilBrown 提交于 11月 02, 2016

1/ using pr_debug() for a number of messages reduces the noise of
   md, but still allows them to be enabled when needed.
2/ try to be consistent in the usage of pr_err() and pr_warn(), and
   document the intention
3/ When strings have been split onto multiple lines, rejoin into
   a single string.
   The cost of having lines > 80 chars is less than the cost of not
   being able to easily search for a particular message.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

9d48739e

md: fix some issues with alloc_disk_sb() · 7f0f0d87

由 NeilBrown 提交于 11月 02, 2016

1/ don't print a warning if allocation fails.
 page_alloc() does that already.
2/ always check return status for error.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7f0f0d87

md/bitmap: call bitmap_file_unmap once bitmap_storage_alloc returns -ENOMEM · cbb38732

由 Guoqing Jiang 提交于 10月 31, 2016

It is possible that bitmap_storage_alloc could return -ENOMEM,
and some member inside store could be allocated such as filemap.

To avoid memory leak, we need to call bitmap_file_unmap to free
those members in the bitmap_resize.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

cbb38732

raid5: revert commit · 7adb072c

由 Tomasz Majchrzak 提交于 10月 26, 2016

Revert commit 11367799 ("md: Prevent IO hold during accessing to faulty
raid5 array") as it doesn't comply with commit c3cce6cd ("md/raid5:
ensure device failure recorded before write request returns."). That change
is not required anymore as the problem is resolved by commit 16f88949
("md: report 'write_pending' state when array in sync") - read request is
stuck as array state is not reported correctly via sysfs attribute.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7adb072c

md: wake up personality thread after array state update · 91a6c4ad

由 Tomasz Majchrzak 提交于 10月 25, 2016

When raid1/raid10 array fails to write to one of the drives, the request
is added to bio_end_io_list and finished by personality thread. The
thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In
case of external metadata this flag is cleared, however the thread is
not woken up. It causes request to be blocked for few seconds (until
another action on the array wakes up the thread) or to get stuck
indefinitely.

Wake up personality thread once MD_CHANGE_PENDING has been cleared.
Moving 'restart_array' call after the flag is cleared it not a solution
because in read-write mode the call doesn't wake up the thread.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

91a6c4ad

md: don't fail an array if there are unacknowledged bad blocks · dcbcb486

由 Tomasz Majchrzak 提交于 10月 21, 2016

If external metadata handler supports bad blocks and unacknowledged bad
blocks are present, don't report disk via sysfs as faulty. Such
situation can be still handled so disk just has to be blocked for a
moment. It makes it consistent with kernel state as corresponding rdev
flag is also not set.

When the disk in being unblocked there are few cases:
1. Disk has been in blocked and faulty state, it is being unblocked but
it still remains in faulty state. Metadata handler will remove it from
array in the next call.
2. There is no bad block support in external metadata handler and bad
blocks are present - put the disk in blocked and faulty state (see
case 1).
3. There is bad block support in external metadata handler and all bad
blocks are acknowledged - clear all flags, continue.
4. There is bad block support in external metadata handler but there are
still unacknowledged bad blocks - clear all flags, continue. It is fine
to clear Blocked flag because it was probably not set anyway (if it was
it is case 1). BlockedBadBlocks flag can also be cleared because the
request waiting for it will set it again when it finds out that some bad
block is still not acknowledged. Recovery is not necessary but there are
no problems if the flag is set. Sysfs rdev state is still reported as
blocked (due to unacknowledged bad blocks) so metadata handler will
process remaining bad blocks and unblock disk again.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

dcbcb486

md: add bad block support for external metadata · 35b785f7

由 Tomasz Majchrzak 提交于 10月 21, 2016

Add new rdev flag which external metadata handler can use to switch
on/off bad block support. If new bad block is encountered, notify it via
rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been
cleared, notify update to rdev 'bad_blocks' sysfs file.

When bad blocks support is being removed, just clear rdev flag. It is
not necessary to reset badblocks->shift field. If there are bad blocks
cleared or added at the same time, it is ok for those changes to be
applied to the structure. The array is in blocked state and the drive
which cannot handle bad blocks any more will be removed from the array
before it is unlocked.

Simplify state_show function by adding a separator at the end of each
string and overwrite last separator with new line.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

35b785f7

29 10月, 2016 3 次提交

md: be careful not lot leak internal curr_resync value into metadata. -- (all) · 1217e1d1

由 NeilBrown 提交于 10月 28, 2016

mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.

 1 - means that the array is trying to start a resync, but has yielded
     to another array which shares physical devices, and also needs to
     start a resync
 2 - means the array is trying to start resync, but has found another
     array which shares physical devices and has already started resync.

 3 - means that resync has commensed, but it is possible that nothing
     has actually been resynced yet.

It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint.  In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.

There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.

Change them to only propagate the value if it is > 3.

As this can cause an array to fail, the patch is suitable for -stable.

Cc: stable@vger.kernel.org (v3.7+)
Reported-by: NViswesh <viswesh.vichu@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

1217e1d1

raid1: handle read error also in readonly mode · 7449f699

由 Tomasz Majchrzak 提交于 10月 28, 2016

If write is the first operation on a disk and it happens not to be
aligned to page size, block layer sends read request first. If read
operation fails, the disk is set as failed as no attempt to fix the
error is made because array is in auto-readonly mode. Similarily, the
disk is set as failed for read-only array.

Take the same approach as in raid10. Don't fail the disk if array is in
readonly or auto-readonly mode. Try to redirect the request first and if
unsuccessful, return a read error.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7449f699

raid5-cache: correct condition for empty metadata write · 9a8b27fa

由 Shaohua Li 提交于 10月 27, 2016

As long as we recover one metadata block, we should write the empty metadata
write. The original code could make recovery corrupted if only one meta is
valid.
Reported-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

9a8b27fa

25 10月, 2016 5 次提交

md: report 'write_pending' state when array in sync · 16f88949

由 Tomasz Majchrzak 提交于 10月 24, 2016

If there is a bad block on a disk and there is a recovery performed from
this disk, the same bad block is reported for a new disk. It involves
setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external
metadata this flag is not being cleared as array state is reported as
'clean'. The read request to bad block in RAID5 array gets stuck as it
is waiting for a flag to be cleared - as per commit c3cce6cd
("md/raid5: ensure device failure recorded before write request
returns.").

The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been
clarified in commit 070dc6dd ("md: resolve confusion of
MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in
personality error handlers since and it doesn't fully comply with
initial purpose. It was supposed to notify that write request is about
to start, however now it is also used to request metadata update.
Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has
been set and in_sync has been set to 0 at the same time. Error handlers
just set the flag without modifying in_sync value. Sysfs array state is
a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is
set and in_sync is set to 1. Userspace has no idea it is expected to
take some action.

Swap the order that array state is checked so 'write_pending' is
reported ahead of 'clean' ('write_pending' is a misleading name but it
is too late to rename it now).
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

16f88949

md/raid5: write an empty meta-block when creating log super-block · 56056c2e

由 Zhengyuan Liu 提交于 10月 24, 2016

If superblock points to an invalid meta block, r5l_load_log will set
create_super with true and create an new superblock, this runtime path
would always happen if we do no writing I/O to this array since it was
created. Writing an empty meta block could avoid this unnecessary
action at the first time we created log superblock.

Another reason is for the corretness of log recovery. Currently we have
bellow code to guarantee log revocery to be correct.

        if (ctx.seq > log->last_cp_seq + 1) {
                int ret;

                ret = r5l_log_write_empty_meta_block(log, ctx.pos, ctx.seq + 10);
                if (ret)
                        return ret;
                log->seq = ctx.seq + 11;
                log->log_start = r5l_ring_add(log, ctx.pos, BLOCK_SECTORS);
                r5l_write_super(log, ctx.pos);
        } else {
                log->log_start = ctx.pos;
                log->seq = ctx.seq;
        }

If we just created a array with a journal device, log->log_start and
log->last_checkpoint should all be 0, then we write three meta block
which are valid except mid one and supposed crash happened. The ctx.seq
would equal to log->last_cp_seq + 1 and log->log_start would be set to
position of mid invalid meta block after we did a recovery, this will
lead to problems which could be avoided with this patch.
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

56056c2e

md/raid5: initialize next_checkpoint field before use · 28cd88e2

由 Zhengyuan Liu 提交于 10月 24, 2016

No initial operation was done to this field when we
load/recovery the log, it got assignment only when IO
to raid disk was finished. So r5l_quiesce may use wrong
next_checkpoint to reclaim log space, that would make
reclaimable space calculation confused.
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

28cd88e2

RAID10: ignore discard error · 579ed34f

由 Shaohua Li 提交于 10月 06, 2016

This is the counterpart of raid10 fix. If a write error occurs, raid10
will try to rewrite the bio in small chunk size. If the rewrite fails,
raid10 will record the error in bad block. narrow_write_error will
always use WRITE for the bio, but actually it could be a discard. Since
discard bio hasn't payload, write the bio will cause different issues.
But discard error isn't fatal, we can safely ignore it. This is what
this patch does.

This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.

Cc: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: NShaohua Li <shli@fb.com>

579ed34f

RAID1: ignore discard error · e3f948cd

由 Shaohua Li 提交于 10月 06, 2016

If a write error occurs, raid1 will try to rewrite the bio in small
chunk size. If the rewrite fails, raid1 will record the error in bad
block. narrow_write_error will always use WRITE for the bio, but
actually it could be a discard. Since discard bio hasn't payload, write
the bio will cause different issues. But discard error isn't fatal, we
can safely ignore it. This is what this patch does.

This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.
Reported-and-tested-by: NSitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: NShaohua Li <shli@fb.com>

e3f948cd

24 10月, 2016 1 次提交

dm table: fix missing dm_put_target_type() in dm_table_add_target() · dafa724b

由 tang.junhui 提交于 10月 21, 2016

dm_get_target_type() was previously called so any error returned from
dm_table_add_target() must first call dm_put_target_type().  Otherwise
the DM target module's reference count will leak and the associated
kernel module will be unable to be removed.

Also, leverage the fact that r is already -EINVAL and remove an extra
newline.

Fixes: 36a0456f ("dm table: add immutable feature")
Fixes: cc6cbe14 ("dm table: add always writeable feature")
Fixes: 3791e2fc ("dm table: add singleton feature")
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dafa724b

19 10月, 2016 2 次提交

dm rq: clear kworker_task if kthread_run() returned an error · 937fa62e

由 Mike Snitzer 提交于 10月 18, 2016

cleanup_mapped_device() calls kthread_stop() if kworker_task is
non-NULL.  Currently the assigned value could be a valid task struct or
an error code (e.g -ENOMEM).  Reset md->kworker_task to NULL if
kthread_run() returned an erorr.

Fixes: 7193a9de ("dm rq: check kthread_run return for .request_fn request-based DM")
Cc: stable@vger.kernel.org # 4.8
Reported-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

937fa62e

dm: free io_barrier after blk_cleanup_queue call · d09960b0

由 Tahsin Erdogan 提交于 10月 10, 2016

dm_old_request_fn() has paths that access md->io_barrier.  The party
destroying io_barrier should ensure that no future execution of
dm_old_request_fn() is possible.  Move io_barrier destruction to below
blk_cleanup_queue() to ensure this and avoid a NULL pointer crash during
request-based DM device shutdown.

Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d09960b0

18 10月, 2016 1 次提交

dm raid: fix activation of existing raid4/10 devices · b052b07c

由 Heinz Mauelshagen 提交于 10月 17, 2016

dm-raid 1.9.0 fails to activate existing RAID4/10 devices that have the
old superblock format (which does not have takeover/reshaping support
that was added via commit 33e53f06).

Fix validation path for old superblocks by reverting to the old raid4
layout and basing checks on mddev->new_{level,layout,...} members in
super_init_validation().

Cc: stable@vger.kernel.org # 4.8
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b052b07c

14 10月, 2016 2 次提交

dm mirror: use all available legs on multiple failures · 12a7cf5b

由 Heinz Mauelshagen 提交于 10月 10, 2016

When any leg(s) have failed, any read will cause a new operational
default leg to be selected and the read is resubmitted to it. If that
new default leg fails the read too, no other still accessible legs are
used to resubmit the read again -- thus failing the io.

Fix by allowing the read to get resubmitted until all operational legs
have been exhausted. Also, remove any details.bi_dev use as a flag.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

12a7cf5b

dm mirror: fix read error on recovery after default leg failure · dcb2ff56

由 Heinz Mauelshagen 提交于 10月 10, 2016

If a default leg has failed, any read will cause a new operational
default leg to be selected and the read is resubmitted.  But until now
the read will return failure even though it was successful due to
resubmission.  The reason for this is bio->bi_error was not being
cleared before resubmitting the bio.

Fix by clearing bio->bi_error before resubmission.

Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dcb2ff56

12 10月, 2016 2 次提交

kthread: kthread worker API cleanup · 3989144f

由 Petr Mladek 提交于 10月 11, 2016

A good practice is to prefix the names of functions by the name
of the subsystem.

The kthread worker API is a mix of classic kthreads and workqueues.  Each
worker has a dedicated kthread.  It runs a generic function that process
queued works.  It is implemented as part of the kthread subsystem.

This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:

__init_kthread_worker()		-> __kthread_init_worker()
init_kthread_worker()		-> kthread_init_worker()
init_kthread_work()		-> kthread_init_work()
insert_kthread_work()		-> kthread_insert_work()
queue_kthread_work()		-> kthread_queue_work()
flush_kthread_work()		-> kthread_flush_work()
flush_kthread_worker()		-> kthread_flush_worker()

Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.

Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:

  + "init" in the function names stands for the verb "initialize"
    aka "initialize worker". While "INIT" in the macro names
    stands for the noun "INITIALIZER" aka "worker initializer".

  + INIT() macros are used only in DEFINE() macros

  + init() functions are used close to the other kthread()
    functions. It looks much better if all the functions
    use the same scheme.

  + There will be also kthread_destroy_worker() that will
    be used close to kthread_cancel_work(). It is related
    to the init() function. Again it looks better if all
    functions use the same naming scheme.

  + there are several precedents for such init() function
    names, e.g. amd_iommu_init_device(), free_area_init_node(),
    jump_label_init_type(),  regmap_init_mmio_clk(),

  + It is not an argument but it was inconsistent even before.

[arnd@arndb.de: fix linux-next merge conflict]
 Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.comSuggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3989144f

dm raid: fix compat_features validation · 5c33677c

由 Andy Whitcroft 提交于 10月 11, 2016

In ecbfb9f1 ("dm raid: add raid level takeover support") a new
compatible feature flag was added.  Validation for these compat_features
was added but this only passes for new raid mappings with this feature
flag.  This causes previously created raid mappings to be failed at
import.

Check compat_features for the only valid combination.

Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
Cc: stable@vger.kernel.org # v4.8
Signed-off-by: NAndy Whitcroft <apw@canonical.com>
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5c33677c

04 10月, 2016 1 次提交

md: set rotational bit · bb086a89

由 Shaohua Li 提交于 9月 30, 2016

if all disks in an array are non-rotational, set the array
non-rotational.

This only works for array with all disks populated at startup. Support
for disk hotadd/hotremove could be added later if necessary.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>

bb086a89

29 9月, 2016 1 次提交

dm mpath: always return reservation conflict without failing over · 8ff232c1

由 Hannes Reinecke 提交于 7月 15, 2015

If dm-mpath encounters an reservation conflict it should not fail the
path (as communication with the target is not affected) but should
rather retry on another path.  However, in doing so we might be inducing
a ping-pong between paths, with no guarantee of any forward progress.
And arguably a reservation conflict is an unexpected error, so we should
be passing it upwards to allow the application to take appropriate
steps.

This change resolves a show-stopper problem seen with the pNFS SCSI
layout because it is trivial to hit reservation conflict based failover
loops without it.

Doubts were raised about the implications of this change relative to
products like IBM's SVC.  But there is little point withholding a fix
for Linux because a proprietary product may or may not have some issues
in its implementation of how it interfaces with Linux.  In the future,
if there is glaring evidence that this change is certainly problematic
we can revisit it.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Acked-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com> # tweaked header

8ff232c1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功