提交 · 47f521ba18190e4bfbb65ead3977af5756884427 · openeuler / Kernel

11 11月, 2017 24 次提交

M
dm cache: lift common migration preparation code to alloc_migration() · ef7afb36
由 Mike Snitzer 提交于 11月 09, 2017
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
ef7afb36
J
dm cache: remove usused deferred_cells member from struct cache · ede6507d
由 Joe Thornber 提交于 11月 09, 2017
```
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
ede6507d

dm cache policy smq: allocate cache blocks in order · 9768a10d

由 Joe Thornber 提交于 11月 09, 2017

Previously, cache blocks were being allocated in reverse order.  Fix
this by pulling the block off the head of the free list.

Shouldn't have any impact on performance or latency but it is more
correct to have the cache blocks allocated/mapped in ascending order.
This fix will slightly increase the chances of two adjacent oblocks
being in adjacent cblocks.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9768a10d

dm cache policy smq: change max background work from 10240 to 4096 blocks · 8ee18ede

由 Joe Thornber 提交于 11月 09, 2017

10240 blocks was too much, lowering this reduces the latency of copying
and consumes less memory.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8ee18ede

dm cache background tracker: limit amount of background work that may be issued at once · 64748b16

由 Joe Thornber 提交于 11月 08, 2017

On large systems the cache policy can be over enthusiastic and queue far
too much dirty data to be written back.  This consumes memory.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

64748b16

dm cache policy smq: take origin idle status into account when queuing writebacks · deb71918

由 Joe Thornber 提交于 11月 08, 2017

If the origin device is idle try and writeback more data.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

deb71918

dm cache policy smq: handle races with queuing background_work · 1e72a8e8

由 Joe Thornber 提交于 11月 08, 2017

The background_tracker holds a set of promotions/demotions that the
cache policy wishes the core target to implement.

When adding a new operation to the tracker it's possible that an
operation on the same block is already present (but in practise this
doesn't appear to be happening).  Catch these situations and do the
appropriate cleanup.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1e72a8e8

dm raid: fix panic when attempting to force a raid to sync · 23397844

由 Heinz Mauelshagen 提交于 11月 02, 2017

Requesting a sync on an active raid device via a table reload
(see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
skips the super_load() call that defines the superblock size
(rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
is called.

Fix by moving the initialization of the superblock start and size
out of super_load() to the caller (analyse_superblocks).
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

23397844

dm integrity: allow unaligned bv_offset · 95b1369a

由 Mikulas Patocka 提交于 11月 07, 2017

When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-integrity checks if bv_offset is aligned on page size and this check
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset check, leaving only the check for
bv_len.

Fixes: 7eada909 ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: NBruno Prémont <bonbons@sysophe.eu>
Reviewed-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

95b1369a

dm crypt: allow unaligned bv_offset · 0440d5c0

由 Mikulas Patocka 提交于 11月 07, 2017

When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).

Fixes: 8f0009a2 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: NBruno Prémont <bonbons@sysophe.eu>
Tested-by: NBruno Prémont <bonbons@sysophe.eu>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0440d5c0

dm: small cleanup in dm_get_md() · 49de5769

由 Mike Snitzer 提交于 11月 06, 2017

Makes dm_get_md() and dm_get_from_kobject() have similar code.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

49de5769

dm: fix race between dm_get_from_kobject() and __dm_destroy() · b9a41d21

由 Hou Tao 提交于 11月 01, 2017

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
     [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
     [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
     [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
     [<ffffffff811de257>] kernfs_seq_show+0x23/0x25
     [<ffffffff81199118>] seq_read+0x16f/0x325
     [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
     [<ffffffff8117b625>] __vfs_read+0x26/0x9d
     [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
     [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
     [<ffffffff8117be9d>] vfs_read+0x8f/0xcf
     [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
     [<ffffffff8117c686>] SyS_read+0x4b/0x76
     [<ffffffff817b606e>] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.

Cc: stable@vger.kernel.org
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b9a41d21

dm: allocate struct mapped_device with kvzalloc · 856eb091

由 Mikulas Patocka 提交于 10月 31, 2017

The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.

In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.

Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

856eb091

dm zoned: ignore last smaller runt zone · 114e0259

由 Damien Le Moal 提交于 10月 28, 2017

The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.

Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.

While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().
Reported-by: NPeter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

114e0259

dm space map metadata: use ARRAY_SIZE · fbc61291

由 Jérémy Lefaure 提交于 10月 01, 2017

Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)
Signed-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fbc61291

dm log writes: add support for DAX · 98d82f48

由 Ross Zwisler 提交于 10月 19, 2017

Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Unfortunately we can't easily track data that has been written via
mmap() now that the dax_flush() abstraction was removed by commit
c3ca015f ("dax: remove the pmem_dax_ops->flush abstraction").
Otherwise we could just treat each flush as a big write, and store the
data that is being synced to media.  It may be worthwhile to add the
dax_flush() entry point back, just as a notifier so we can do this
logging.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

98d82f48

dm log writes: add support for inline data buffers · e5a20660

由 Ross Zwisler 提交于 10月 19, 2017

Currently dm-log-writes supports writing filesystem data via BIOs, and
writing internal metadata from a flat buffer via write_metadata().

For DAX writes, though, we won't have a BIO, but will instead have an
iterator that we'll want to use to fill a flat data buffer.

So, create write_inline_data() which allows us to write filesystem data
using a flat buffer as a source, and wire it up in log_one_block().
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5a20660

dm cache: simplify get_per_bio_data() by removing data_size argument · 693b960e

由 Mike Snitzer 提交于 10月 19, 2017

There is only one per_bio_data size now that writethrough-specific data
was removed from the per_bio_data structure.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

693b960e

dm cache: remove all obsolete writethrough-specific code · 9958f1d9

由 Mike Snitzer 提交于 10月 19, 2017

Now that the writethrough code is much simpler there is no need to track
so much state or cascade bio submission (as was done, via
writethrough_endio(), to issue origin then cache IO in series).

As such the obsolete writethrough list and workqueue is also removed.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9958f1d9

dm cache: submit writethrough writes in parallel to origin and cache · 2df3bae9

由 Mike Snitzer 提交于 10月 19, 2017

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2df3bae9

dm cache: pass cache structure to mode functions · 8e3c3827

由 Mike Snitzer 提交于 10月 19, 2017

No functional changes, just a bit cleaner than passing cache_features
structure.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8e3c3827

dm cache: fix race condition in the writeback mode overwrite_bio optimisation · d1260e2a

由 Joe Thornber 提交于 11月 10, 2017

When a DM cache in writeback mode moves data between the slow and fast
device it can often avoid a copy if the triggering bio either:

i) covers the whole block (no point copying if we're about to overwrite it)
ii) the migration is a promotion and the origin block is currently discarded

Prior to this fix there was a race with case (ii).  The discard status
was checked with a shared lock held (rather than exclusive).  This meant
another bio could run in parallel and write data to the origin, removing
the discard state.  After the promotion the parallel write would have
been lost.

With this fix the discard status is re-checked once the exclusive lock
has been aquired.  If the block is no longer discarded it falls back to
the slower full copy path.

Fixes: b29d4986 ("dm cache: significant rework to leverage dm-bio-prison-v2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d1260e2a

md: free unused memory after bitmap resize · 0868b99c

由 Zdenek Kabelac 提交于 11月 08, 2017

When bitmap is resized, the old kalloced chunks just are not released
once the resized bitmap starts to use new space.

This fixes in particular kmemleak reports like this one:

unreferenced object 0xffff8f4311e9c000 (size 4096):
  comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
  hex dump (first 32 bytes):
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
  backtrace:
    [<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
    [<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
    [<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
    [<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
    [<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
    [<ffffffffa676f1ae>] bitmap_load+0xbe/0x160
    [<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
    [<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
    [<ffffffffa6774b52>] dm_resume+0x122/0x140
    [<ffffffffa6779b9f>] dev_suspend+0x18f/0x290
    [<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
    [<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
    [<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
    [<ffffffffa62d7269>] SyS_ioctl+0x79/0x90
    [<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

0868b99c

md: release allocated bitset sync_set · 0202ce8a

由 Zdenek Kabelac 提交于 11月 08, 2017

Patch fixes kmemleak on md_stop() path used likely only by dm-raid wrapper.
Code of md is using  mddev_put() where both bitsets are released however this
freeing is not shared.

Also set NULL to bio_set and sync_set pointers just like mddev_put is
doing.
Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

0202ce8a

09 11月, 2017 2 次提交

md/bitmap: clear BITMAP_WRITE_ERROR bit before writing it to sb · 97f0eb9f

由 Hou Tao 提交于 11月 06, 2017

For a RAID1 device using a file-based bitmap, if a bitmap write error
occurs but the later writes succeed, it's possible both BITMAP_STALE
and BITMAP_WRITE_ERROR bits will be written to the bitmap super block,
the BITMAP_STALE bit will be handled properly and be cleared, but the
BITMAP_WRITE_ERROR bit in sb->flags will make bitmap_create() to fail.

So clear it to protect against the write failure-and-then-recovery case.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

97f0eb9f

md: be cautious about using ->curr_resync_completed for ->recovery_offset · db0505d3

由 NeilBrown 提交于 10月 17, 2017

The ->recovery_offset shows how much of a non-InSync device is actually
in sync - how much has been recoveryed.

When performing a recovery, ->curr_resync and ->curr_resync_completed
follow the device address being recovered and so can be used to update
->recovery_offset.

When performing a reshape, ->curr_resync* might follow the device
addresses (raid5) or might follow array addresses (raid10), so cannot
in general be used to set ->recovery_offset.  When reshaping backwards,
->curre_resync* measures from the *end* of the array-or-device, so is
particularly unhelpful.

So change the common code in md.c to only use ->curr_resync_complete
for the simple recovery case, and add code to raid5.c to update
->recovery_offset during a forwards reshape.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

db0505d3

03 11月, 2017 1 次提交

dm: move dm-verity to generic async completion · 12f1ffc4

由 Gilad Ben-Yossef 提交于 10月 18, 2017

dm-verity is starting async. crypto ops and waiting for them to complete.
Move it over to generic code doing the same.

This also avoids a future potential data coruption bug created
by the use of wait_for_completion_interruptible() without dealing
correctly with an interrupt aborting the wait prior to the
async op finishing, should this code ever move to a context
where signals are not masked.
Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
CC: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

12f1ffc4

02 11月, 2017 13 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

md: don't check MD_SB_CHANGE_CLEAN in md_allow_write · b90f6ff0

由 Artur Paszkiewicz 提交于 10月 26, 2017

Only MD_SB_CHANGE_PENDING should be used to wait for transition from
clean to dirty. Checking also MD_SB_CHANGE_CLEAN is unnecessary and can
race with e.g. md_do_sync(). This sporadically causes a hang when
changing consistency policy during resync:

INFO: task mdadm:6183 blocked for more than 30 seconds.
      Not tainted 4.14.0-rc3+ #391
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mdadm           D12752  6183   6022 0x00000000
Call Trace:
 __schedule+0x93f/0x990
 schedule+0x6b/0x90
 md_allow_write+0x100/0x130 [md_mod]
 ? do_wait_intr_irq+0x90/0x90
 resize_stripes+0x3a/0x5b0 [raid456]
 ? kernfs_fop_write+0xbe/0x180
 raid5_change_consistency_policy+0xa6/0x200 [raid456]
 consistency_policy_store+0x2e/0x70 [md_mod]
 md_attr_store+0x90/0xc0 [md_mod]
 sysfs_kf_write+0x42/0x50
 kernfs_fop_write+0x119/0x180
 __vfs_write+0x28/0x110
 ? rcu_sync_lockdep_assert+0x12/0x60
 ? __sb_start_write+0x15a/0x1c0
 ? vfs_write+0xa3/0x1a0
 vfs_write+0xb4/0x1a0
 SyS_write+0x49/0xa0
 entry_SYSCALL_64_fastpath+0x18/0xad

Fixes: 2214c260 ("md: don't return -EAGAIN in md_allow_write for external metadata arrays")
Cc: <stable@vger.kernel.org>
Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b90f6ff0

G
md-cluster: update document for raid10 · f0e230ad
由 Guoqing Jiang 提交于 10月 24, 2017
```
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
f0e230ad

md: remove redundant variable q · fc33060b

由 Colin Ian King 提交于 10月 27, 2017

The pointer q is assigned but never read; it is redundant and can
be removed.  Cleans up clang warning:

drivers/md/md-multipath.c:260:4: warning: Value stored to 'q' is
never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fc33060b

raid1: remove obsolete code in raid1_write_request · f81f7302

由 Guoqing Jiang 提交于 10月 24, 2017

There are some lines could be removed due to recent
change for raid1 such as commit 3956df15d634 ("md:
move suspend_hi/lo handling into core md code").

Also, seems some comments are put to wrong place,
move them before wait_barrier.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f81f7302

md-cluster: Use a small window for raid10 resync · 8db87912

由 Guoqing Jiang 提交于 10月 24, 2017

Suspending the entire device for resync could take
too long. Resync in small chunks.

cluster's resync window is maintained in r10conf as
cluster_sync_low and cluster_sync_high, and processed
in raid10's sync_request(). If the current resync is
outside the cluster resync window:

1. Set the cluster_sync_low to curr_resync_completed.
2. Set cluster_sync_high to cluster_sync_low + stripe
   size.
3. Send a message to all nodes so they may add it in
   their suspension list.

Note:
We only support "near" raid10 so far, resync a far or
offset raid10 array could have trouble. So raid10_run
checks the layout of clustered raid10, it will refuse
to run if the layout is not correct.

With the "near" layout we process one stripe at a time
progressing monotonically through the address space.
So we can have a sliding window of whole-stripes which
moves through the array suspending IO on other nodes,
and both resync which uses array addresses and recovery
which uses device addresses can stay within this window.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

8db87912

md-cluster: Suspend writes in RAID10 if within range · cb8a7a7e

由 Guoqing Jiang 提交于 10月 24, 2017

If there is a resync going on, all nodes must suspend
writes to the range. This is recorded in suspend_info
and suspend_list.

If there is an I/O within the ranges of any of the
suspend_info, area_resyncing will return 1.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

cb8a7a7e

md-cluster/raid10: set "do_balance = 0" if area is resyncing · d4098c72

由 Guoqing Jiang 提交于 10月 24, 2017

Just like clustered raid1, it is impossible for cluster raid10
to choose the best device for read balance when the area of
array is resyncing. Because we cannot trust the data to be the
same on all devices at that time, so we choose just the first
one to use, so set do_balance to 0.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d4098c72

md: use lockdep_assert_held · efa4b77b

由 Shaohua Li 提交于 10月 18, 2017

lockdep_assert_held is a better way to assert lock held, and it works
for UP.
Signed-off-by: NShaohua Li <shli@fb.com>

efa4b77b

raid1: prevent freeze_array/wait_all_barriers deadlock · f6eca2d4

由 Nate Dailey 提交于 10月 17, 2017

If freeze_array is attempted in the middle of close_sync/
wait_all_barriers, deadlock can occur.

freeze_array will wait for nr_pending and nr_queued to line up.
wait_all_barriers increments nr_pending for each barrier bucket, one
at a time, but doesn't actually issue IO that could be counted in
nr_queued. So freeze_array is blocked until wait_all_barriers
completes and allow_all_barriers runs. At the same time, when
_wait_barrier sees array_frozen == 1, it stops and waits for
freeze_array to complete.

Prevent the deadlock by making close_sync call _wait_barrier and
_allow_barrier for one bucket at a time, instead of deferring the
_allow_barrier calls until after all _wait_barriers are complete.
Signed-off-by: NNate Dailey <nate.dailey@stratus.com>
Fix: fd76863e(RAID1: a new I/O barrier implementation to remove resync window)
Reviewed-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org (v4.11)
Signed-off-by: NShaohua Li <shli@fb.com>

f6eca2d4

md: use TASK_IDLE instead of blocking signals · ae89fd3d

由 Mikulas Patocka 提交于 10月 18, 2017

Hi - I submit this patch for the next merge window:

Some times ago, I made a patch f9c79bc0 that blocks signals around the
schedule() calls in MD. The MD subsystem needs to do an uninterruptible
sleep that is not accounted in load average - so we block signals and use
interruptible sleep.

The kernel has a special TASK_IDLE state for this purpose, so we can use
it instead of blocking signals. This patch doesn't fix any bug, it just
makes the code simpler.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ae89fd3d

md: remove special meaning of ->quiesce(.., 2) · b03e0ccb

由 NeilBrown 提交于 10月 19, 2017

The '2' argument means "wake up anything that is waiting".
This is an inelegant part of the design and was added
to help support management of suspend_lo/suspend_hi setting.
Now that suspend_lo/hi is managed in mddev_suspend/resume,
that need is gone.
These is still a couple of places where we call 'quiesce'
with an argument of '2', but they can safely be changed to
call ->quiesce(.., 1); ->quiesce(.., 0) which
achieve the same result at the small cost of pausing IO
briefly.

This removes a small "optimization" from suspend_{hi,lo}_store,
but it isn't clear that optimization served a useful purpose.
The code now is a lot clearer.
Suggested-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b03e0ccb

md: allow metadata update while suspending. · 35bfc521

由 NeilBrown 提交于 10月 17, 2017

There are various deadlocks that can occur
when a thread holds reconfig_mutex and calls
->quiesce(mddev, 1).
As some write request block waiting for
metadata to be updated (e.g. to record device
failure), and as the md thread updates the metadata
while the reconfig mutex is held, holding the mutex
can stop write requests completing, and this prevents
->quiesce(mddev, 1) from completing.

->quiesce() is now usually called from mddev_suspend(),
and it is always called with reconfig_mutex held.  So
at this time it is safe for the thread to update metadata
without explicitly taking the lock.

So add 2 new flags, one which says the unlocked updates is
allowed, and one which ways it is happening.  Then allow it
while the quiesce completes, and then wait for it to finish.
Reported-and-tested-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

35bfc521

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功