1. 04 12月, 2017 2 次提交
    • M
      dm: fix various targets to dm_register_target after module __init resources created · 7e6358d2
      monty_pavel@sina.com 提交于
      A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
      processes race to load the dm-thin-pool module:
      
       PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
        #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
        0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
        0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
        0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
        0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
        0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
        0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
        0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
        0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
           [exception RIP: kmem_cache_alloc+108]
           RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
           RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
           RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
           RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
           R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
           R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
           ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
        0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
       0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
       0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
       0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
       0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
       0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
       0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]
      
      The race results in a NULL pointer because:
      
      Process A (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target not registered;
       	c. modprobe dm_thin_pool and wait until end.
      
      Process B (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target registered;
       	c. table_load->dm_table_add_target->pool_ctr;
       	d. _new_mapping_cache is NULL and panic.
      Note:
       	1. process A and process B are two concurrent processes.
       	2. pool_target can be detected by process B but
       	_new_mapping_cache initialization has not ended.
      
      To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
      with the same problem, simply dm_register_target() after all resources
      created during module init (as labelled with __init) are finished.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: Nmonty <monty_pavel@sina.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7e6358d2
    • M
      dm table: fix regression from improper dm_dev_internal.count refcount_t conversion · afc567a4
      Mike Snitzer 提交于
      Multiple refcounts are needed if the device was already added.  The
      micro-optimization of setting the refcount to 1 on first added (rather
      than fall thru to a common refcount_inc) lost sight of the fact that the
      refcount_inc is also needed for the case when the device already exists
      and the mode need not be upgraded.
      
      Fixes: 2a0b4682 ("dm: convert dm_dev_internal.count from atomic_t to refcount_t")
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      afc567a4
  2. 25 11月, 2017 4 次提交
    • M
      bcache: check return value of register_shrinker · 6c4ca1e3
      Michael Lyle 提交于
      register_shrinker is now __must_check, so check it to kill a warning.
      Caller of bch_btree_cache_alloc in super.c appropriately checks return
      value so this is fully plumbed through.
      
      This V2 fixes checkpatch warnings and improves the commit description,
      as I was too hasty getting the previous version out.
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NVojtech Pavlik <vojtech@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6c4ca1e3
    • R
      bcache: recover data from backing when data is clean · e393aa24
      Rui Hua 提交于
      When we send a read request and hit the clean data in cache device, there
      is a situation called cache read race in bcache(see the commit in the tail
      of cache_look_up(), the following explaination just copy from there):
      The bucket we're reading from might be reused while our bio is in flight,
      and we could then end up reading the wrong data. We guard against this
      by checking (in bch_cache_read_endio()) if the pointer is stale again;
      if so, we treat it as an error (s->iop.error = -EINTR) and reread from
      the backing device (but we don't pass that error up anywhere)
      
      It should be noted that cache read race happened under normal
      circumstances, not the circumstance when SSD failed, it was counted
      and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.
      
      Without this patch, when we use writeback mode, we will never reread from
      the backing device when cache read race happened, until the whole cache
      device is clean, because the condition
      (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
      cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
      will be passed up, at last, user will receive -EINTR when it's bio end,
      this is not suitable, and wield to up-application.
      
      In this patch, we use s->read_dirty_data to judge whether the read
      request hit dirty data in cache device, it is safe to reread data from
      the backing device when the read request hit clean data. This can not
      only handle cache read race, but also recover data when failed read
      request from cache device.
      
      [edited by mlyle to fix up whitespace, commit log title, comment
      spelling]
      
      Fixes: d59b2379 ("bcache: only permit to recovery read error when cache device is clean")
      Cc: <stable@vger.kernel.org> # 4.14
      Signed-off-by: NHua Rui <huarui.dev@gmail.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NColy Li <colyli@suse.de>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e393aa24
    • H
      bcache: Fix building error on MIPS · cf33c1ee
      Huacai Chen 提交于
      This patch try to fix the building error on MIPS. The reason is MIPS
      has already defined the PTR macro, which conflicts with the PTR macro
      in include/uapi/linux/bcache.h.
      
      [fixed by mlyle: corrected a line-length issue]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf33c1ee
    • T
      bcache: add a comment in journal bucket reading · bb22cafd
      Tang Junhui 提交于
      Journal bucket is a circular buffer, the bucket
      can be like YYYNNNYY, which means the first valid journal in
      the 7th bucket, and the latest valid journal in third bucket, in
      this case, if we do not try we the zero index first, We
      may get a valid journal in the 7th bucket, then we call
      find_next_bit(bitmap,ca->sb.njournal_buckets, l + 1) to get the
      first invalid bucket after the 7th bucket, because all these
      buckets is valid, so no bit 1 in bitmap, thus find_next_bit()
      function would return with ca->sb.njournal_buckets (8). So, after
      that, bcache only read journal in 7th and 8the bucket,
      the first to the third buckets are lost.
      
      So, it is important to let developer know that, we need to try
      the zero index at first in the hash-search, and avoid any breaks
      in future's code modification.
      
      [ML: Fixed whitespace & formatting & file permissions]
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb22cafd
  3. 17 11月, 2017 5 次提交
    • E
      dm bufio: fix integer overflow when limiting maximum cache size · 74d4108d
      Eric Biggers 提交于
      The default max_cache_size_bytes for dm-bufio is meant to be the lesser
      of 25% of the size of the vmalloc area and 2% of the size of lowmem.
      However, on 32-bit systems the intermediate result in the expression
      
          (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
      
      overflows, causing the wrong result to be computed.  For example, on a
      32-bit system where the vmalloc area is 520093696 bytes, the result is
      1174405 rather than the expected 130023424, which makes the maximum
      cache size much too small (far less than 2% of lowmem).  This causes
      severe performance problems for dm-verity users on affected systems.
      
      Fix this by using mult_frac() to correctly multiply by a percentage.  Do
      this for all places in dm-bufio that multiply by a percentage.  Also
      replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
      to the comment is now defined in include/linux/vmalloc.h.
      
      Depends-on: 9993bc63 ("sched/x86: Fix overflow in cyc2ns_offset")
      Fixes: 95d402f0 ("dm: add bufio")
      Cc: <stable@vger.kernel.org> # v3.2+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      74d4108d
    • M
      dm: clear all discard attributes in queue_limits when discards are disabled · 5d47c89f
      Mike Snitzer 提交于
      Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the
      various discard attributes (which get exposed via sysfs) may be set.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5d47c89f
    • M
      dm: do not set 'discards_supported' in targets that do not need it · 7dea378b
      Mike Snitzer 提交于
      The DM target's 'discards_supported' flag is intended to act as an
      override.  Meaning, even if the underlying storage doesn't support
      discards the DM target will.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7dea378b
    • M
      dm: discard support requires all targets in a table support discards · 8a74d29d
      Mike Snitzer 提交于
      A DM device with a mix of discard capabilities (due to some underlying
      devices not having discard support) _should_ just return -EOPNOTSUPP for
      the region of the device that doesn't support discards (even if only by
      way of the underlying driver formally not supporting discards).  BUT,
      that does ask the underlying driver to handle something that it never
      advertised support for.  In doing so we're exposing users to the
      potential for a underlying disk driver hanging if/when a discard is
      issued a the device that is incapable and never claimed to support
      discards.
      
      Fix this by requiring that each DM target in a DM table provide discard
      support as a prereq for a DM device to advertise support for discards.
      
      This may cause some configurations that were happily supporting discards
      (even in the face of a mix of discard support) to stop supporting
      discards -- but the risk of users hitting driver hangs, and forced
      reboots, outweighs supporting those fringe mixed discard
      configurations.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8a74d29d
    • M
      dm mpath: remove annoying message of 'blk_get_request() returned -11' · 9dc112e2
      Ming Lei 提交于
      It is very normal to see allocation failure, especially with blk-mq
      request_queues, so it's unnecessary to report this error and annoy
      people.
      
      In practice this 'blk_get_request() returned -11' error gets logged
      quite frequently when a blk-mq DM multipath device sees heavy IO.
      
      This change is marked for stable@ because the annoying message in
      question was included in stable@ commit 7083abbb.
      
      Fixes: 7083abbb ("dm mpath: avoid that path removal can trigger an infinite loop")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9dc112e2
  4. 15 11月, 2017 1 次提交
  5. 11 11月, 2017 24 次提交
    • M
    • J
      ede6507d
    • J
      dm cache policy smq: allocate cache blocks in order · 9768a10d
      Joe Thornber 提交于
      Previously, cache blocks were being allocated in reverse order.  Fix
      this by pulling the block off the head of the free list.
      
      Shouldn't have any impact on performance or latency but it is more
      correct to have the cache blocks allocated/mapped in ascending order.
      This fix will slightly increase the chances of two adjacent oblocks
      being in adjacent cblocks.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9768a10d
    • J
      dm cache policy smq: change max background work from 10240 to 4096 blocks · 8ee18ede
      Joe Thornber 提交于
      10240 blocks was too much, lowering this reduces the latency of copying
      and consumes less memory.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8ee18ede
    • J
      dm cache background tracker: limit amount of background work that may be issued at once · 64748b16
      Joe Thornber 提交于
      On large systems the cache policy can be over enthusiastic and queue far
      too much dirty data to be written back.  This consumes memory.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      64748b16
    • J
      dm cache policy smq: take origin idle status into account when queuing writebacks · deb71918
      Joe Thornber 提交于
      If the origin device is idle try and writeback more data.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      deb71918
    • J
      dm cache policy smq: handle races with queuing background_work · 1e72a8e8
      Joe Thornber 提交于
      The background_tracker holds a set of promotions/demotions that the
      cache policy wishes the core target to implement.
      
      When adding a new operation to the tracker it's possible that an
      operation on the same block is already present (but in practise this
      doesn't appear to be happening).  Catch these situations and do the
      appropriate cleanup.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1e72a8e8
    • H
      dm raid: fix panic when attempting to force a raid to sync · 23397844
      Heinz Mauelshagen 提交于
      Requesting a sync on an active raid device via a table reload
      (see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
      skips the super_load() call that defines the superblock size
      (rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
      is called.
      
      Fix by moving the initialization of the superblock start and size
      out of super_load() to the caller (analyse_superblocks).
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      23397844
    • M
      dm integrity: allow unaligned bv_offset · 95b1369a
      Mikulas Patocka 提交于
      When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
      this unaligned memory for its buffers (if an unaligned buffer crosses a
      page, XFS frees it and allocates a full page instead - see the function
      xfs_buf_allocate_memory).
      
      dm-integrity checks if bv_offset is aligned on page size and this check
      fail with slub_debug and XFS.
      
      Fix this bug by removing the bv_offset check, leaving only the check for
      bv_len.
      
      Fixes: 7eada909 ("dm: add integrity target")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NBruno Prémont <bonbons@sysophe.eu>
      Reviewed-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      95b1369a
    • M
      dm crypt: allow unaligned bv_offset · 0440d5c0
      Mikulas Patocka 提交于
      When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
      this unaligned memory for its buffers (if an unaligned buffer crosses a
      page, XFS frees it and allocates a full page instead - see the function
      xfs_buf_allocate_memory).
      
      dm-crypt checks if bv_offset is aligned on page size and these checks
      fail with slub_debug and XFS.
      
      Fix this bug by removing the bv_offset checks. Switch to checking if
      bv_len is aligned instead of bv_offset (this check should be sufficient
      to prevent overruns if a bio with too small bv_len is received).
      
      Fixes: 8f0009a2 ("dm crypt: optionally support larger encryption sector size")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NBruno Prémont <bonbons@sysophe.eu>
      Tested-by: NBruno Prémont <bonbons@sysophe.eu>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0440d5c0
    • M
      dm: small cleanup in dm_get_md() · 49de5769
      Mike Snitzer 提交于
      Makes dm_get_md() and dm_get_from_kobject() have similar code.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      49de5769
    • H
      dm: fix race between dm_get_from_kobject() and __dm_destroy() · b9a41d21
      Hou Tao 提交于
      The following BUG_ON was hit when testing repeat creation and removal of
      DM devices:
      
          kernel BUG at drivers/md/dm.c:2919!
          CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
          Call Trace:
           [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
           [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
           [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
           [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
           [<ffffffff811de257>] kernfs_seq_show+0x23/0x25
           [<ffffffff81199118>] seq_read+0x16f/0x325
           [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
           [<ffffffff8117b625>] __vfs_read+0x26/0x9d
           [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
           [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
           [<ffffffff8117be9d>] vfs_read+0x8f/0xcf
           [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
           [<ffffffff8117c686>] SyS_read+0x4b/0x76
           [<ffffffff817b606e>] system_call_fastpath+0x12/0x71
      
      The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
      between the test of DMF_FREEING & DMF_DELETING and dm_get() in
      dm_get_from_kobject().
      
      To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
      dm_get() are done in an atomic way, so _minor_lock is used.
      
      The other callers of dm_get() have also been checked to be OK: some
      callers invoke dm_get() under _minor_lock, some callers invoke it under
      _hash_lock, and dm_start_request() invoke it after increasing
      md->open_count.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b9a41d21
    • M
      dm: allocate struct mapped_device with kvzalloc · 856eb091
      Mikulas Patocka 提交于
      The structure srcu_struct can be very big, its size is proportional to the
      value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
      io_barrier in the struct mapped_device has 84kB in the debugging kernel
      and 50kB in the non-debugging kernel. The large size may result in failure
      of the function kzalloc_node.
      
      In order to avoid the allocation failure, we use the function
      kvzalloc_node, this function falls back to vmalloc if a large contiguous
      chunk of memory is not available. This patch also moves the field
      io_barrier to the last position of struct mapped_device - the reason is
      that on many processor architectures, short memory offsets result in
      smaller code than long memory offsets - on x86-64 it reduces code size by
      320 bytes.
      
      Note to stable kernel maintainers - the kernels 4.11 and older don't have
      the function kvzalloc_node, you can use the function vzalloc_node instead.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      856eb091
    • D
      dm zoned: ignore last smaller runt zone · 114e0259
      Damien Le Moal 提交于
      The SCSI layer allows ZBC drives to have a smaller last runt zone. For
      such a device, specifying the entire capacity for a dm-zoned target
      table entry fails because the specified capacity is not aligned on a
      device zone size indicated in the request queue structure of the
      device.
      
      Fix this problem by ignoring the last runt zone in the entry length
      when seting up the dm-zoned target (ctr method) and when iterating table
      entries of the target (iterate_devices method). This allows dm-zoned
      users to still easily setup a target using the entire device capacity
      (as mandated by dm-zoned) or the aligned capacity excluding the last
      runt zone.
      
      While at it, replace direct references to the device queue chunk_sectors
      limit with calls to the accessor blk_queue_zone_sectors().
      Reported-by: NPeter Desnoyers <pjd@ccs.neu.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      114e0259
    • J
      dm space map metadata: use ARRAY_SIZE · fbc61291
      Jérémy Lefaure 提交于
      Using the ARRAY_SIZE macro improves the readability of the code.
      
      Found with Coccinelle with the following semantic patch:
      @r depends on (org || report)@
      type T;
      T[] E;
      position p;
      @@
      (
       (sizeof(E)@p /sizeof(*E))
      |
       (sizeof(E)@p /sizeof(E[...]))
      |
       (sizeof(E)@p /sizeof(T))
      )
      Signed-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fbc61291
    • R
      dm log writes: add support for DAX · 98d82f48
      Ross Zwisler 提交于
      Now that we have the ability log filesystem writes using a flat buffer, add
      support for DAX.
      
      The motivation for this support is the need for an xfstest that can test
      the new MAP_SYNC DAX flag.  By logging the filesystem activity with
      dm-log-writes we can show that the MAP_SYNC page faults are writing out
      their metadata as they happen, instead of requiring an explicit
      msync/fsync.
      
      Unfortunately we can't easily track data that has been written via
      mmap() now that the dax_flush() abstraction was removed by commit
      c3ca015f ("dax: remove the pmem_dax_ops->flush abstraction").
      Otherwise we could just treat each flush as a big write, and store the
      data that is being synced to media.  It may be worthwhile to add the
      dax_flush() entry point back, just as a notifier so we can do this
      logging.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      98d82f48
    • R
      dm log writes: add support for inline data buffers · e5a20660
      Ross Zwisler 提交于
      Currently dm-log-writes supports writing filesystem data via BIOs, and
      writing internal metadata from a flat buffer via write_metadata().
      
      For DAX writes, though, we won't have a BIO, but will instead have an
      iterator that we'll want to use to fill a flat data buffer.
      
      So, create write_inline_data() which allows us to write filesystem data
      using a flat buffer as a source, and wire it up in log_one_block().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5a20660
    • M
      dm cache: simplify get_per_bio_data() by removing data_size argument · 693b960e
      Mike Snitzer 提交于
      There is only one per_bio_data size now that writethrough-specific data
      was removed from the per_bio_data structure.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      693b960e
    • M
      dm cache: remove all obsolete writethrough-specific code · 9958f1d9
      Mike Snitzer 提交于
      Now that the writethrough code is much simpler there is no need to track
      so much state or cascade bio submission (as was done, via
      writethrough_endio(), to issue origin then cache IO in series).
      
      As such the obsolete writethrough list and workqueue is also removed.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9958f1d9
    • M
      dm cache: submit writethrough writes in parallel to origin and cache · 2df3bae9
      Mike Snitzer 提交于
      Discontinue issuing writethrough write IO in series to the origin and
      then cache.
      
      Use bio_clone_fast() to create a new origin clone bio that will be
      mapped to the origin device and then bio_chain() it to the bio that gets
      remapped to the cache device.  The origin clone bio does _not_ have a
      copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
      be called.
      
      The cache bio (parent bio) will not complete until the origin bio has
      completed -- this fulfills bio_clone_fast()'s requirements as well as
      the requirement to not complete the original IO until the write IO has
      completed to both the origin and cache device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2df3bae9
    • M
      dm cache: pass cache structure to mode functions · 8e3c3827
      Mike Snitzer 提交于
      No functional changes, just a bit cleaner than passing cache_features
      structure.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8e3c3827
    • J
      dm cache: fix race condition in the writeback mode overwrite_bio optimisation · d1260e2a
      Joe Thornber 提交于
      When a DM cache in writeback mode moves data between the slow and fast
      device it can often avoid a copy if the triggering bio either:
      
      i) covers the whole block (no point copying if we're about to overwrite it)
      ii) the migration is a promotion and the origin block is currently discarded
      
      Prior to this fix there was a race with case (ii).  The discard status
      was checked with a shared lock held (rather than exclusive).  This meant
      another bio could run in parallel and write data to the origin, removing
      the discard state.  After the promotion the parallel write would have
      been lost.
      
      With this fix the discard status is re-checked once the exclusive lock
      has been aquired.  If the block is no longer discarded it falls back to
      the slower full copy path.
      
      Fixes: b29d4986 ("dm cache: significant rework to leverage dm-bio-prison-v2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d1260e2a
    • Z
      md: free unused memory after bitmap resize · 0868b99c
      Zdenek Kabelac 提交于
      When bitmap is resized, the old kalloced chunks just are not released
      once the resized bitmap starts to use new space.
      
      This fixes in particular kmemleak reports like this one:
      
      unreferenced object 0xffff8f4311e9c000 (size 4096):
        comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
        hex dump (first 32 bytes):
          02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
          02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
        backtrace:
          [<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
          [<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
          [<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
          [<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
          [<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
          [<ffffffffa676f1ae>] bitmap_load+0xbe/0x160
          [<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
          [<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
          [<ffffffffa6774b52>] dm_resume+0x122/0x140
          [<ffffffffa6779b9f>] dev_suspend+0x18f/0x290
          [<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
          [<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
          [<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
          [<ffffffffa62d7269>] SyS_ioctl+0x79/0x90
          [<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      0868b99c
    • Z
      md: release allocated bitset sync_set · 0202ce8a
      Zdenek Kabelac 提交于
      Patch fixes kmemleak on md_stop() path used likely only by dm-raid wrapper.
      Code of md is using  mddev_put() where both bitsets are released however this
      freeing is not shared.
      
      Also set NULL to bio_set and sync_set pointers just like mddev_put is
      doing.
      Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      0202ce8a
  6. 09 11月, 2017 2 次提交
    • H
      md/bitmap: clear BITMAP_WRITE_ERROR bit before writing it to sb · 97f0eb9f
      Hou Tao 提交于
      For a RAID1 device using a file-based bitmap, if a bitmap write error
      occurs but the later writes succeed, it's possible both BITMAP_STALE
      and BITMAP_WRITE_ERROR bits will be written to the bitmap super block,
      the BITMAP_STALE bit will be handled properly and be cleared, but the
      BITMAP_WRITE_ERROR bit in sb->flags will make bitmap_create() to fail.
      
      So clear it to protect against the write failure-and-then-recovery case.
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      97f0eb9f
    • N
      md: be cautious about using ->curr_resync_completed for ->recovery_offset · db0505d3
      NeilBrown 提交于
      The ->recovery_offset shows how much of a non-InSync device is actually
      in sync - how much has been recoveryed.
      
      When performing a recovery, ->curr_resync and ->curr_resync_completed
      follow the device address being recovered and so can be used to update
      ->recovery_offset.
      
      When performing a reshape, ->curr_resync* might follow the device
      addresses (raid5) or might follow array addresses (raid10), so cannot
      in general be used to set ->recovery_offset.  When reshaping backwards,
      ->curre_resync* measures from the *end* of the array-or-device, so is
      particularly unhelpful.
      
      So change the common code in md.c to only use ->curr_resync_complete
      for the simple recovery case, and add code to raid5.c to update
      ->recovery_offset during a forwards reshape.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      db0505d3
  7. 03 11月, 2017 1 次提交
  8. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318