- 06 1月, 2017 2 次提交
-
-
由 Song Liu 提交于
We only need to update sh->log_start at the end of recovery, which is r5c_recovery_rewrite_data_only_stripes(), so it is not necessary to set it before that. In this patch, log_start is removed from r5c_recovery_alloc_stripe(). After updating all sh->log_start, rewrite_data_only_stripes() also updates log->next_checkpoints to the last sh->log_start. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 JackieLiu 提交于
The write-through mode has been returned in front of the function, do not need to do it again. Signed-off-by: NJackieLiu <liuyun01@kylinos.cn> Reviewed-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 04 1月, 2017 2 次提交
-
-
由 Robert LeBlanc 提交于
Refactor raid10_make_request into seperate read and write functions to clean up the code. Shaohua: add the recovery check back to read path Signed-off-by: NRobert LeBlanc <robert@leblancnet.us> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Robert LeBlanc 提交于
Refactor raid1_make_request to make read and write code in their own functions to clean up the code. Signed-off-by: NRobert LeBlanc <robert@leblancnet.us> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2016 2 次提交
-
-
由 Eric Wheeler 提交于
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net> Tested-by: NWido den Hollander <wido@widodh.nl>
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
-
- 16 12月, 2016 1 次提交
-
-
由 Michael S. Tsirkin 提交于
__bitwise__ used to mean "yes, please enable sparse checks unconditionally", but now that we dropped __CHECK_ENDIAN__ __bitwise is exactly the same. There aren't many users, replace it by __bitwise everywhere. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NStefan Schmidt <stefan@osg.samsung.com> Acked-by: NKrzysztof Kozlowski <krzk@kernel.org> Akced-by: NLee Duncan <lduncan@suse.com>
-
- 14 12月, 2016 1 次提交
-
-
由 Mike Snitzer 提交于
Recent dm-flakey fixes, to have reads error out during the "down" interval, made it so that the previous read behaviour is no longer available. It is useful to have reads complete like normal but have writes error out, so make it possible again with a new "error_writes" feature. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 09 12月, 2016 25 次提交
-
-
由 Shaohua Li 提交于
The mddev->flags are used for different purposes. There are a lot of places we check/change the flags without masking unrelated flags, we could check/change unrelated flags. These usage are most for superblock write, so spearate superblock related flags. This should make the code clearer and also fix real bugs. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
Fixes: 90f5f7ad("md: Wait for md_check_recovery before attempting device removal.") Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Shaohua Li 提交于
When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit will be accidentally set, but raid5 doesn't support it. The same is true for the MD_HAS_JOURNAL bit. Fix: 46533ff7 (md: Use REQ_FAILFAST_* on metadata writes where appropriate) Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Mike Snitzer 提交于
Switch to using hash_32() because hash_32_generic() should only be used by the kernel's selftests. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Ondrej Kozina 提交于
Unfortunately key_string may theoretically contain whitespace even after it's processed by dm_split_args(). The reason for this is DM core supports escaping of almost all chars including any whitespace. If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key" dm-crypt will look up key "my_prefix:my key" in kernel keyring service. So far everything's fine. Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version instead. Also userpace (most notably cryptsetup) is not ready to parse single target argument containing (even escaped) whitespace chars and any whitespace is simply taken as delimiter of another argument. This effect is mitigated by the fact libdevmapper curently performs double escaping of '\' char. Any user input in format "x\ x" is transformed into "x\\ x" before being passed to the kernel. Nonetheless dm-crypt may be used without libdevmapper. Therefore the near-term solution to this is to reject any key string containing whitespace. Signed-off-by: NOndrej Kozina <okozina@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Benjamin Marzinski 提交于
If no block was allocated or freed, sm_ll_mutate() wasn't setting *ev, leaving the variable unitialized. sm_ll_insert(), sm_disk_inc_block(), and sm_disk_new_block() all check ev to see if there was an allocation event in sm_ll_mutate(), possibly reading unitialized data. If no allocation event occured, sm_ll_mutate() should set *ev to SM_NONE. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Benjamin Marzinski 提交于
When metadata_ll_init_index() is called by sm_ll_new_metadata(), ll->mi_le hasn't been initialized yet. So, when metadata_ll_init_index() copies the contents of ll->mi_le into the newly allocated bitmap_root, it is just copying garbage. ll->mi_le will be allocated later in sm_ll_extend() and copied into the bitmap_root, in sm_ll_commit(). Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Benjamin Marzinski 提交于
In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
由 Heinz Mauelshagen 提交于
Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the configure_discard_support() call from raid_ctr() to raid_preresume(). Enabling/disabling discard _must_ happen during table load (through the .ctr hook). Fix this regression by moving the configure_discard_support() call back to raid_ctr(). Fixes: ecbfb9f1 ("dm raid: add raid level takeover support") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Heinz Mauelshagen 提交于
Remove CTR_FLAG_MAX_WRITE_BEHIND from raid4/5/6's valid ctr flags. Only the md raid1 personality supports setting a maximum number of "write behind" write IOs on any legs set to "write mostly". "write mostly" enhances throughput with slow links/disks. Technically the "write behind" value is a write intent bitmap property only being respected by the raid1 personality. It allows a maximum number of "write behind" writes to any "write mostly" raid1 mirror legs to be delayed and avoids reads from such legs. No other MD personalities supported via dm-raid make use of "write behind", thus setting this property is superfluous; it wouldn't cause harm but it is correct to reject it. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 tang.junhui 提交于
Let the requested m->hw_handler_params be used if the attached hardware handler is the same handler as requested with m->hw_handler_name. Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Ondrej Kozina 提交于
The kernel key service is a generic way to store keys for the use of other subsystems. Currently there is no way to use kernel keys in dm-crypt. This patch aims to fix that. Instead of key userspace may pass a key description with preceding ':'. So message that constructs encryption mapping now looks like this: <cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>] where <key_string> is in format: <key_size>:<key_type>:<key_description> Currently we only support two elementary key types: 'user' and 'logon'. Keys may be loaded in dm-crypt either via <key_string> or using classical method and pass the key in hex representation directly. dm-crypt device initialised with a key passed in hex representation may be replaced with key passed in key_string format and vice versa. (Based on original work by Andrey Ryabinin) Signed-off-by: NOndrej Kozina <okozina@redhat.com> Reviewed-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
A value is assigned to 'nr_entries' but is never used, remove it. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
Subtracting sizes is a fragile approach because the result is only correct if the compiler has not added any padding at the end of the structure. Hence use offsetof() instead of size subtraction. An additional advantage of offsetof() is that it makes the intent more clear. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
Use a single statement to declare and initialize 'use_blk_mq' instead of two statements. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
After QUEUE_FLAG_DYING has been set any code that is waiting in get_request() should be woken up. But to get this behaviour blk_set_queue_dying() must be used instead of only setting QUEUE_FLAG_DYING. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. Suggested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Douglas Anderson 提交于
We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). This change significantly increases responsiveness of the system while in this state. It makes a real difference because it unblocks kswapd. In the bug report analyzed, kswapd was hung: kswapd0 D ffffffc000204fd8 0 72 2 0x00000000 Call trace: [<ffffffc000204fd8>] __switch_to+0x9c/0xa8 [<ffffffc00090b794>] __schedule+0x440/0x6d8 [<ffffffc00090bac0>] schedule+0x94/0xb4 [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44 [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68 [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78 [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464 [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198 [<ffffffc00030e578>] balance_pgdat+0x328/0x508 [<ffffffc00030eb7c>] kswapd+0x424/0x51c [<ffffffc00023f06c>] kthread+0x10c/0x114 [<ffffffc000203dd0>] ret_from_fork+0x10/0x40 By unblocking kswapd memory pressure should be reduced. Suggested-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
Use a single loop instead of two loops to determine whether or not all_blk_mq has to be set. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Bart Van Assche 提交于
When dm_table_set_type() is used by a target to establish a DM table's type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the DM core must go on to verify that the devices in the table are compatible with the established type. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
An earlier DM multipath table could have been build ontop of underlying devices that were all using blk-mq. In that case, if that active multipath table is replaced with an empty DM multipath table (that reflects all paths have failed) then it is important that the 'all_blk_mq' state of the active table is transfered to the new empty DM table. Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a request that isn't needed by the DM multipath target when it is to issue IO to an underlying blk-mq device. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Cc: stable@vger.kernel.org # 4.8+ Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Tested-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Song Liu 提交于
Currently, we increase journal entry seq by 10 after recovery. However, this is not sufficient in the following case. After crash the journal looks like | seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | If +1 is not valid, we dropped all entries from +1 to +12; and write seq+10: | seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | However, if we write a big journal entry with seq+11, it will connect with some stale journal entry: | seq+0 | +10 | +11 | +12 | To reduce the risk of this issue, we increase seq by 10000 instead. Shaohua: use 10000 instead of 1000. The risk should be very unlikely. The total stripe cache size is less than 2k typically, and several stripes can fit into one meta data block. So the total inflight meta data blocks would be quite small, which means the the total sequence number used should be quite small. The 10000 sequence number increase should be far more than safe. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Song Liu 提交于
r5l_recovery_create_empty_meta_block() creates crc for the empty metablock. After the metablock is updated, we need clear the checksum before recalculate it. Shaohua: moved checksum calculation out of r5l_recovery_create_empty_meta_block. We should calculate it after all fields are updated. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 JackieLiu 提交于
When create the super-block information, We do not need to do this recovery stage, only need to initialize some variables. Signed-off-by: NJackieLiu <liuyun01@kylinos.cn> Reviewed-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 06 12月, 2016 3 次提交
-
-
由 NeilBrown 提交于
md_open() gets a counted reference on an mddev using mddev_find(). If it ends up returning an error, it must drop this reference. There are two error paths where the reference is not dropped. One only happens if the process is signalled and an awkward time, which is quite unlikely. The other was introduced recently in commit af8d8e6f. Change the code to ensure the drop the reference when returning an error, and make it harded to re-introduce this sort of bug in the future. Reported-by: NMarc Smith <marc.smith@mcc.edu> Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag") Signed-off-by: NNeilBrown <neilb@suse.com> Acked-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Zhengyuan Liu 提交于
We should update log state after we did a log recovery, current completion may get wrong log state since log->log_start wasn't initalized until we called r5l_recovery_log. At log recovery stage, no lock needed as there is no race conditon. next_checkpoint field will be initialized in r5l_recovery_log too. Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 JackieLiu 提交于
When recovery is complete, we write an empty block and record his position first, then make the data-only stripes rewritten done, the location of the empty block as the last checkpoint position to write into the super block. And we should update last_checkpoint to this empty block position. ------------------------------------------------------------------ | old log | empty block | data only stripes | invalid log | ------------------------------------------------------------------ ^ ^ ^ | |- log->last_checkpoint |- log->log_start | |- log->last_cp_seq |- log->next_checkpoint |- log->seq=n |- log->seq=10+n At the same time, if there is no data-only stripes, this scene may appear, | meta1 | meta2 | meta3 | meta 1 is valid, meta 2 is invalid. meta 3 could be valid. so we should The solution is we create a new meta in meta2 with its seq == meta1's seq + 10 and let superblock points to meta2. Signed-off-by: NJackieLiu <liuyun01@kylinos.cn> Reviewed-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn> Reviewed-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 03 12月, 2016 1 次提交
-
-
由 Song Liu 提交于
With writeback cache, we define log space critical as free_space < 2 * reclaim_required_space So the deassert of R5C_LOG_CRITICAL could happen when 1. free_space increases 2. reclaim_required_space decreases Currently, run_no_space_stripes() is called when 1 happens, but not (always) when 2 happens. With this patch, run_no_space_stripes() is call when R5C_LOG_CRITICAL is cleared. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 30 11月, 2016 2 次提交
-
-
由 Konstantin Khlebnikov 提交于
Current implementation employ 16bit counter of active stripes in lower bits of bio->bi_phys_segments. If request is big enough to overflow this counter bio will be completed and freed too early. Fortunately this not happens in default configuration because several other limits prevent that: stripe_cache_size * nr_disks effectively limits count of active stripes. And small max_sectors_kb at lower disks prevent that during normal read/write operations. Overflow easily happens in discard if it's enabled by module parameter "devices_handle_discard_safely" and stripe_cache_size is set big enough. This patch limits requests size with 256Mb - 8Kb to prevent overflows. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Shaohua Li <shli@kernel.org> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: NShaohua Li <shli@fb.com>
-
由 JackieLiu 提交于
R5c_make_stripe_write_out has set this flag, do not need to set again. Signed-off-by: NJackieLiu <liuyun01@kylinos.cn> Signed-off-by: NShaohua Li <shli@fb.com>
-