提交 · 3c66abbaaf69671dfd3eb9fa7740b5d7ec688231 · openanolis / cloud-kernel

06 1月, 2017 2 次提交

md/r5cache: simplify handling of sh->log_start in recovery · 3c66abba

由 Song Liu 提交于 12月 14, 2016

We only need to update sh->log_start at the end of recovery,
which is r5c_recovery_rewrite_data_only_stripes(), so it is not
necessary to set it before that. In this patch, log_start is
removed from r5c_recovery_alloc_stripe().

After updating all sh->log_start, rewrite_data_only_stripes()
also updates log->next_checkpoints to the last sh->log_start.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

3c66abba

md/raid5-cache: removes unnecessary write-through mode judgments · 28ca833e

由 JackieLiu 提交于 12月 13, 2016

The write-through mode has been returned in front of the function,
do not need to do it again.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

28ca833e

04 1月, 2017 2 次提交

md/raid10: Refactor raid10_make_request · bb5f1ed7

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid10_make_request into seperate read and write functions to
clean up the code.

Shaohua: add the recovery check back to read path
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

bb5f1ed7

md/raid1: Refactor raid1_make_request · 3b046a97

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid1_make_request to make read and write code in their own
functions to clean up the code.
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

3b046a97

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

18 12月, 2016 2 次提交
- E
  bcache: partition support: add 16 minors per bcacheN device · b8c0d911
  由 Eric Wheeler 提交于 10月 23, 2016
```
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Tested-by: NWido den Hollander <wido@widodh.nl>
```
  b8c0d911
- K
  bcache: Make gc wakeup sane, remove set_task_state() · be628be0
  由 Kent Overstreet 提交于 10月 26, 2016
```
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
```
  be628be0
16 12月, 2016 1 次提交

linux: drop __bitwise__ everywhere · 9efeccac

由 Michael S. Tsirkin 提交于 12月 11, 2016

__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NKrzysztof Kozlowski <krzk@kernel.org>
Akced-by: NLee Duncan <lduncan@suse.com>

9efeccac

14 12月, 2016 1 次提交

dm flakey: introduce "error_writes" feature · ef548c55

由 Mike Snitzer 提交于 12月 13, 2016

Recent dm-flakey fixes, to have reads error out during the "down"
interval, made it so that the previous read behaviour is no longer
available.

It is useful to have reads complete like normal but have writes error
out, so make it possible again with a new "error_writes" feature.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ef548c55

09 12月, 2016 25 次提交

md: separate flags for superblock changes · 2953079c

由 Shaohua Li 提交于 12月 08, 2016

The mddev->flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2953079c

md: MD_RECOVERY_NEEDED is set for mddev->recovery · 82a301cb

由 Shaohua Li 提交于 12月 08, 2016

Fixes: 90f5f7ad("md: Wait for md_check_recovery before attempting device
removal.")
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

82a301cb

md: takeover should clear unrelated bits · 6995f0b2

由 Shaohua Li 提交于 12月 08, 2016

When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit
will be accidentally set, but raid5 doesn't support it. The same is true
for the MD_HAS_JOURNAL bit.

Fix: 46533ff7 (md: Use REQ_FAILFAST_* on metadata writes where appropriate)
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6995f0b2

dm cache policy smq: use hash_32() instead of hash_32_generic() · e99dda8f

由 Mike Snitzer 提交于 12月 08, 2016

Switch to using hash_32() because hash_32_generic() should only be used
by the kernel's selftests.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e99dda8f

dm crypt: reject key strings containing whitespace chars · 027c431c

由 Ondrej Kozina 提交于 12月 01, 2016

Unfortunately key_string may theoretically contain whitespace even after
it's processed by dm_split_args(). The reason for this is DM core
supports escaping of almost all chars including any whitespace.

If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key"
dm-crypt will look up key "my_prefix:my key" in kernel keyring service.
So far everything's fine.

Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not
receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version
instead. Also userpace (most notably cryptsetup) is not ready to parse
single target argument containing (even escaped) whitespace chars and any
whitespace is simply taken as delimiter of another argument.

This effect is mitigated by the fact libdevmapper curently performs
double escaping of '\' char. Any user input in format "x\ x" is
transformed into "x\\ x" before being passed to the kernel. Nonetheless
dm-crypt may be used without libdevmapper. Therefore the near-term
solution to this is to reject any key string containing whitespace.
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

027c431c

dm space map: always set ev if sm_ll_mutate() succeeds · b446396b

由 Benjamin Marzinski 提交于 12月 04, 2016

If no block was allocated or freed, sm_ll_mutate() wasn't setting
*ev, leaving the variable unitialized. sm_ll_insert(),
sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
if there was an allocation event in sm_ll_mutate(), possibly
reading unitialized data.

If no allocation event occured, sm_ll_mutate() should set *ev
to SM_NONE.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b446396b

dm space map metadata: skip useless memcpy in metadata_ll_init_index() · 0c79ce0b

由 Benjamin Marzinski 提交于 12月 02, 2016

When metadata_ll_init_index() is called by sm_ll_new_metadata(),
ll->mi_le hasn't been initialized yet. So, when
metadata_ll_init_index() copies the contents of ll->mi_le into the
newly allocated bitmap_root, it is just copying garbage. ll->mi_le
will be allocated later in sm_ll_extend() and copied into the
bitmap_root, in sm_ll_commit().
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0c79ce0b

dm space map metadata: fix 'struct sm_metadata' leak on failed create · 314c25c5

由 Benjamin Marzinski 提交于 11月 30, 2016

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

314c25c5

dm raid: fix discard support regression · 11e29684

由 Heinz Mauelshagen 提交于 11月 29, 2016

Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the
configure_discard_support() call from raid_ctr() to raid_preresume().

Enabling/disabling discard _must_ happen during table load (through the
.ctr hook).  Fix this regression by moving the
configure_discard_support() call back to raid_ctr().

Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

11e29684

dm raid: don't allow "write behind" with raid4/5/6 · affa9d28

由 Heinz Mauelshagen 提交于 11月 24, 2016

Remove CTR_FLAG_MAX_WRITE_BEHIND from raid4/5/6's valid ctr flags.

Only the md raid1 personality supports setting a maximum number
of "write behind" write IOs on any legs set to "write mostly".
"write mostly" enhances throughput with slow links/disks.

Technically the "write behind" value is a write intent bitmap
property only being respected by the raid1 personality.  It allows a
maximum number of "write behind" writes to any "write mostly" raid1
mirror legs to be delayed and avoids reads from such legs.

No other MD personalities supported via dm-raid make use of "write
behind", thus setting this property is superfluous; it wouldn't cause
harm but it is correct to reject it.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

affa9d28

dm mpath: use hw_handler_params if attached hw_handler is same as requested · 54cd640d

由 tang.junhui 提交于 11月 24, 2016

Let the requested m->hw_handler_params be used if the attached hardware
handler is the same handler as requested with m->hw_handler_name.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

54cd640d

dm crypt: add ability to use keys from the kernel key retention service · c538f6ec

由 Ondrej Kozina 提交于 11月 21, 2016

The kernel key service is a generic way to store keys for the use of
other subsystems. Currently there is no way to use kernel keys in dm-crypt.
This patch aims to fix that. Instead of key userspace may pass a key
description with preceding ':'. So message that constructs encryption
mapping now looks like this:

  <cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>]

where <key_string> is in format: <key_size>:<key_type>:<key_description>

Currently we only support two elementary key types: 'user' and 'logon'.
Keys may be loaded in dm-crypt either via <key_string> or using
classical method and pass the key in hex representation directly.

dm-crypt device initialised with a key passed in hex representation may be
replaced with key passed in key_string format and vice versa.

(Based on original work by Andrey Ryabinin)
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c538f6ec

dm array: remove a dead assignment in populate_ablock_with_values() · 0637018d

由 Bart Van Assche 提交于 11月 18, 2016

A value is assigned to 'nr_entries' but is never used, remove it.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0637018d

dm ioctl: use offsetof() instead of open-coding it · 6080758d

由 Bart Van Assche 提交于 11月 18, 2016

Subtracting sizes is a fragile approach because the result is only
correct if the compiler has not added any padding at the end of the
structure. Hence use offsetof() instead of size subtraction. An
additional advantage of offsetof() is that it makes the intent more
clear.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6080758d

dm rq: simplify use_blk_mq initialization · b23df0d0

由 Bart Van Assche 提交于 11月 18, 2016

Use a single statement to declare and initialize 'use_blk_mq' instead
of two statements.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b23df0d0

dm: use blk_set_queue_dying() in __dm_destroy() · 2e91c369

由 Bart Van Assche 提交于 11月 18, 2016

After QUEUE_FLAG_DYING has been set any code that is waiting in
get_request() should be woken up.  But to get this behaviour
blk_set_queue_dying() must be used instead of only setting
QUEUE_FLAG_DYING.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2e91c369

dm bufio: drop the lock when doing GFP_NOIO allocation · 41c73a49

由 Mikulas Patocka 提交于 11月 23, 2016

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

41c73a49

dm bufio: don't take the lock in dm_bufio_shrink_count · d12067f4

由 Mikulas Patocka 提交于 11月 23, 2016

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d12067f4

dm bufio: avoid sleeping while holding the dm_bufio lock · 9ea61cac

由 Douglas Anderson 提交于 11月 17, 2016

We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:

  mutex_lock+0x4c/0x68
  dm_bufio_shrink_count+0x38/0x78
  shrink_slab.part.54.constprop.65+0x100/0x464
  shrink_zone+0xa8/0x198

In the two cases analyzed, we see one task that looks like this:

  Workqueue: kverityd verity_prefetch_io

  __switch_to+0x9c/0xa8
  __schedule+0x440/0x6d8
  schedule+0x94/0xb4
  schedule_timeout+0x204/0x27c
  schedule_timeout_uninterruptible+0x44/0x50
  wait_iff_congested+0x9c/0x1f0
  shrink_inactive_list+0x3a0/0x4cc
  shrink_lruvec+0x418/0x5cc
  shrink_zone+0x88/0x198
  try_to_free_pages+0x51c/0x588
  __alloc_pages_nodemask+0x648/0xa88
  __get_free_pages+0x34/0x7c
  alloc_buffer+0xa4/0x144
  __bufio_new+0x84/0x278
  dm_bufio_prefetch+0x9c/0x154
  verity_prefetch_io+0xe8/0x10c
  process_one_work+0x240/0x424
  worker_thread+0x2fc/0x424
  kthread+0x10c/0x114

...and that looks to be the one holding the mutex.

The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
     http://crbug.com/468342
   ...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
     nice ./launchBalloons.sh 4 900 100000
   ...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs

With that, I've seen printouts like:
  DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().

The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock.  Instead we should be using
GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.

The current behavior explained by David Rientjes:

  It will still try reclaim initially because __GFP_WAIT (or
  __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
  contention on dm_bufio_lock() that the thread holds.  You want to
  pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
  mutex that can be contended by a concurrent slab shrinker (if
  count_objects didn't use a trylock, this pattern would trivially
  deadlock).

This change significantly increases responsiveness of the system while
in this state.  It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:

   kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
   Call trace:
   [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
   [<ffffffc00090b794>] __schedule+0x440/0x6d8
   [<ffffffc00090bac0>] schedule+0x94/0xb4
   [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
   [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
   [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
   [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
   [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
   [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
   [<ffffffc00030e578>] balance_pgdat+0x328/0x508
   [<ffffffc00030eb7c>] kswapd+0x424/0x51c
   [<ffffffc00023f06c>] kthread+0x10c/0x114
   [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

By unblocking kswapd memory pressure should be reduced.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9ea61cac

dm table: simplify dm_table_determine_type() · 5b8c01f7

由 Bart Van Assche 提交于 11月 15, 2016

Use a single loop instead of two loops to determine whether or not
all_blk_mq has to be set.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5b8c01f7

dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device · 301fc3f5

由 Bart Van Assche 提交于 12月 07, 2016

When dm_table_set_type() is used by a target to establish a DM table's
type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the
DM core must go on to verify that the devices in the table are
compatible with the established type.

Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

301fc3f5

dm table: fix 'all_blk_mq' inconsistency when an empty table is loaded · 6936c12c

由 Mike Snitzer 提交于 11月 23, 2016

An earlier DM multipath table could have been build ontop of underlying
devices that were all using blk-mq.  In that case, if that active
multipath table is replaced with an empty DM multipath table (that
reflects all paths have failed) then it is important that the
'all_blk_mq' state of the active table is transfered to the new empty DM
table.  Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a
request that isn't needed by the DM multipath target when it is to issue
IO to an underlying blk-mq device.

Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature")
Cc: stable@vger.kernel.org # 4.8+
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Tested-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6936c12c

md/r5cache: after recovery, increase journal seq by 10000 · 3c6edc66

由 Song Liu 提交于 12月 07, 2016

Currently, we increase journal entry seq by 10 after recovery.
However, this is not sufficient in the following case.

After crash the journal looks like

| seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

If +1 is not valid, we dropped all entries from +1 to +12; and
write seq+10:

| seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 |

However, if we write a big journal entry with seq+11, it will
connect with some stale journal entry:

| seq+0 | +10 |                     +11                 | +12 |

To reduce the risk of this issue, we increase seq by 10000 instead.

Shaohua: use 10000 instead of 1000. The risk should be very unlikely. The total
stripe cache size is less than 2k typically, and several stripes can fit into
one meta data block. So the total inflight meta data blocks would be quite
small, which means the the total sequence number used should be quite small.
The 10000 sequence number increase should be far more than safe.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

3c6edc66

md/raid5-cache: fix crc in rewrite_data_only_stripes() · 5c88f403

由 Song Liu 提交于 12月 07, 2016

r5l_recovery_create_empty_meta_block() creates crc for the empty
metablock. After the metablock is updated, we need clear the
checksum before recalculate it.

Shaohua: moved checksum calculation out of
r5l_recovery_create_empty_meta_block. We should calculate it after all fields
are updated.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5c88f403

md/raid5-cache: no recovery is required when create super-block · d30dfeb9

由 JackieLiu 提交于 12月 08, 2016

When create the super-block information, We do not need to do this
recovery stage, only need to initialize some variables.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d30dfeb9

06 12月, 2016 3 次提交

md: fix refcount problem on mddev when stopping array. · e2342ca8

由 NeilBrown 提交于 12月 05, 2016

md_open() gets a counted reference on an mddev using mddev_find().
If it ends up returning an error, it must drop this reference.

There are two error paths where the reference is not dropped.
One only happens if the process is signalled and an awkward time,
which is quite unlikely.
The other was introduced recently in commit af8d8e6f.

Change the code to ensure the drop the reference when returning an error,
and make it harded to re-introduce this sort of bug in the future.
Reported-by: NMarc Smith <marc.smith@mcc.edu>
Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

e2342ca8

md/r5cache: do r5c_update_log_state after log recovery · 3d7e7e1d

由 Zhengyuan Liu 提交于 12月 04, 2016

We should update log state after we did a log recovery, current completion
may get wrong log state since log->log_start wasn't initalized until we
called r5l_recovery_log.

At log recovery stage, no lock needed as there is no race conditon.
next_checkpoint field will be initialized in r5l_recovery_log too.
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

3d7e7e1d

md/raid5-cache: adjust the write position of the empty block if no data blocks · 43b96748

由 JackieLiu 提交于 12月 05, 2016

When recovery is complete, we write an empty block and record his
position first, then make the data-only stripes rewritten done,
the location of the empty block as the last checkpoint position
to write into the super block. And we should update last_checkpoint
to this empty block position.

------------------------------------------------------------------
|  old log       | empty block | data only stripes | invalid log |
------------------------------------------------------------------
^                ^                                 ^
|                |- log->last_checkpoint           |- log->log_start
|                |- log->last_cp_seq               |- log->next_checkpoint
|- log->seq=n    |- log->seq=10+n

At the same time, if there is no data-only stripes, this scene may appear,
| meta1 | meta2 | meta3 |
meta 1 is valid, meta 2 is invalid. meta 3 could be valid. so we should
The solution is we create a new meta in meta2 with its seq == meta1's
seq + 10 and let superblock points to meta2.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Reviewed-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

43b96748

03 12月, 2016 1 次提交

md/r5cache: run_no_space_stripes() when R5C_LOG_CRITICAL == 0 · f687a33e

由 Song Liu 提交于 11月 30, 2016

With writeback cache, we define log space critical as

   free_space < 2 * reclaim_required_space

So the deassert of R5C_LOG_CRITICAL could happen when
  1. free_space increases
  2. reclaim_required_space decreases

Currently, run_no_space_stripes() is called when 1 happens, but
not (always) when 2 happens.

With this patch, run_no_space_stripes() is call when
R5C_LOG_CRITICAL is cleared.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f687a33e

30 11月, 2016 2 次提交

md/raid5: limit request size according to implementation limits · e8d7c332

由 Konstantin Khlebnikov 提交于 11月 27, 2016

Current implementation employ 16bit counter of active stripes in lower
bits of bio->bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Shaohua Li <shli@kernel.org>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fb.com>

e8d7c332

md/raid5-cache: do not need to set STRIPE_PREREAD_ACTIVE repeatedly · 1a0ec5c3

由 JackieLiu 提交于 11月 29, 2016

R5c_make_stripe_write_out has set this flag, do not need to set again.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Signed-off-by: NShaohua Li <shli@fb.com>

1a0ec5c3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功