1. 07 1月, 2018 1 次提交
  2. 17 12月, 2017 1 次提交
  3. 08 12月, 2017 2 次提交
    • S
      dm bufio: fix shrinker scans when (nr_to_scan < retain_target) · fbc7c07e
      Suren Baghdasaryan 提交于
      When system is under memory pressure it is observed that dm bufio
      shrinker often reclaims only one buffer per scan. This change fixes
      the following two issues in dm bufio shrinker that cause this behavior:
      
      1. ((nr_to_scan - freed) <= retain_target) condition is used to
      terminate slab scan process. This assumes that nr_to_scan is equal
      to the LRU size, which might not be correct because do_shrink_slab()
      in vmscan.c calculates nr_to_scan using multiple inputs.
      As a result when nr_to_scan is less than retain_target (64) the scan
      will terminate after the first iteration, effectively reclaiming one
      buffer per scan and making scans very inefficient. This hurts vmscan
      performance especially because mutex is acquired/released every time
      dm_bufio_shrink_scan() is called.
      New implementation uses ((LRU size - freed) <= retain_target)
      condition for scan termination. LRU size can be safely determined
      inside __scan() because this function is called after dm_bufio_lock().
      
      2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
      determine number of freeable objects in the slab. However dm_bufio
      always retains retain_target buffers in its LRU and will terminate
      a scan when this mark is reached. Therefore returning the entire LRU size
      from dm_bufio_shrink_count() is misleading because that does not
      represent the number of freeable objects that slab will reclaim during
      a scan. Returning (LRU size - retain_target) better represents the
      number of freeable objects in the slab. This way do_shrink_slab()
      returns 0 when (LRU size < retain_target) and vmscan will not try to
      scan this shrinker avoiding scans that will not reclaim any memory.
      
      Test: tested using Android device running
      <AOSP>/system/extras/alloc-stress that generates memory pressure
      and causes intensive shrinker scans
      Signed-off-by: NSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fbc7c07e
    • M
      dm mpath: fix bio-based multipath queue_if_no_path handling · c1fd0abe
      Mike Snitzer 提交于
      Commit ca5beb76 ("dm mpath: micro-optimize the hot path relative to
      MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's
      "test_02_sdev_delete".
      
      Restoring the logic that existed prior to commit ca5beb76 fixes this
      bio-based DM-multipath regression.  Also verified all mptest tests pass
      with request-based DM-multipath.
      
      This commit effectively reverts commit ca5beb76 -- but it does so
      without reintroducing the need to take the m->lock spinlock in
      must_push_back_{rq,bio}.
      
      Fixes: ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH")
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c1fd0abe
  4. 04 12月, 2017 2 次提交
    • M
      dm: fix various targets to dm_register_target after module __init resources created · 7e6358d2
      monty_pavel@sina.com 提交于
      A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
      processes race to load the dm-thin-pool module:
      
       PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
        #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
        0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
        0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
        0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
        0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
        0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
        0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
        0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
        0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
           [exception RIP: kmem_cache_alloc+108]
           RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
           RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
           RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
           RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
           R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
           R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
           ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
        0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
       0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
       0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
       0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
       0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
       0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
       0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]
      
      The race results in a NULL pointer because:
      
      Process A (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target not registered;
       	c. modprobe dm_thin_pool and wait until end.
      
      Process B (vgchange -ay -K):
       	a. send DM_LIST_VERSIONS_CMD ioctl;
       	b. pool_target registered;
       	c. table_load->dm_table_add_target->pool_ctr;
       	d. _new_mapping_cache is NULL and panic.
      Note:
       	1. process A and process B are two concurrent processes.
       	2. pool_target can be detected by process B but
       	_new_mapping_cache initialization has not ended.
      
      To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
      with the same problem, simply dm_register_target() after all resources
      created during module init (as labelled with __init) are finished.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: Nmonty <monty_pavel@sina.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7e6358d2
    • M
      dm table: fix regression from improper dm_dev_internal.count refcount_t conversion · afc567a4
      Mike Snitzer 提交于
      Multiple refcounts are needed if the device was already added.  The
      micro-optimization of setting the refcount to 1 on first added (rather
      than fall thru to a common refcount_inc) lost sight of the fact that the
      refcount_inc is also needed for the case when the device already exists
      and the mode need not be upgraded.
      
      Fixes: 2a0b4682 ("dm: convert dm_dev_internal.count from atomic_t to refcount_t")
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      afc567a4
  5. 02 12月, 2017 4 次提交
  6. 25 11月, 2017 4 次提交
    • M
      bcache: check return value of register_shrinker · 6c4ca1e3
      Michael Lyle 提交于
      register_shrinker is now __must_check, so check it to kill a warning.
      Caller of bch_btree_cache_alloc in super.c appropriately checks return
      value so this is fully plumbed through.
      
      This V2 fixes checkpatch warnings and improves the commit description,
      as I was too hasty getting the previous version out.
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NVojtech Pavlik <vojtech@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6c4ca1e3
    • R
      bcache: recover data from backing when data is clean · e393aa24
      Rui Hua 提交于
      When we send a read request and hit the clean data in cache device, there
      is a situation called cache read race in bcache(see the commit in the tail
      of cache_look_up(), the following explaination just copy from there):
      The bucket we're reading from might be reused while our bio is in flight,
      and we could then end up reading the wrong data. We guard against this
      by checking (in bch_cache_read_endio()) if the pointer is stale again;
      if so, we treat it as an error (s->iop.error = -EINTR) and reread from
      the backing device (but we don't pass that error up anywhere)
      
      It should be noted that cache read race happened under normal
      circumstances, not the circumstance when SSD failed, it was counted
      and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.
      
      Without this patch, when we use writeback mode, we will never reread from
      the backing device when cache read race happened, until the whole cache
      device is clean, because the condition
      (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
      cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
      will be passed up, at last, user will receive -EINTR when it's bio end,
      this is not suitable, and wield to up-application.
      
      In this patch, we use s->read_dirty_data to judge whether the read
      request hit dirty data in cache device, it is safe to reread data from
      the backing device when the read request hit clean data. This can not
      only handle cache read race, but also recover data when failed read
      request from cache device.
      
      [edited by mlyle to fix up whitespace, commit log title, comment
      spelling]
      
      Fixes: d59b2379 ("bcache: only permit to recovery read error when cache device is clean")
      Cc: <stable@vger.kernel.org> # 4.14
      Signed-off-by: NHua Rui <huarui.dev@gmail.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NColy Li <colyli@suse.de>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e393aa24
    • H
      bcache: Fix building error on MIPS · cf33c1ee
      Huacai Chen 提交于
      This patch try to fix the building error on MIPS. The reason is MIPS
      has already defined the PTR macro, which conflicts with the PTR macro
      in include/uapi/linux/bcache.h.
      
      [fixed by mlyle: corrected a line-length issue]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf33c1ee
    • T
      bcache: add a comment in journal bucket reading · bb22cafd
      Tang Junhui 提交于
      Journal bucket is a circular buffer, the bucket
      can be like YYYNNNYY, which means the first valid journal in
      the 7th bucket, and the latest valid journal in third bucket, in
      this case, if we do not try we the zero index first, We
      may get a valid journal in the 7th bucket, then we call
      find_next_bit(bitmap,ca->sb.njournal_buckets, l + 1) to get the
      first invalid bucket after the 7th bucket, because all these
      buckets is valid, so no bit 1 in bitmap, thus find_next_bit()
      function would return with ca->sb.njournal_buckets (8). So, after
      that, bcache only read journal in 7th and 8the bucket,
      the first to the third buckets are lost.
      
      So, it is important to let developer know that, we need to try
      the zero index at first in the hash-search, and avoid any breaks
      in future's code modification.
      
      [ML: Fixed whitespace & formatting & file permissions]
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb22cafd
  7. 17 11月, 2017 5 次提交
    • E
      dm bufio: fix integer overflow when limiting maximum cache size · 74d4108d
      Eric Biggers 提交于
      The default max_cache_size_bytes for dm-bufio is meant to be the lesser
      of 25% of the size of the vmalloc area and 2% of the size of lowmem.
      However, on 32-bit systems the intermediate result in the expression
      
          (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
      
      overflows, causing the wrong result to be computed.  For example, on a
      32-bit system where the vmalloc area is 520093696 bytes, the result is
      1174405 rather than the expected 130023424, which makes the maximum
      cache size much too small (far less than 2% of lowmem).  This causes
      severe performance problems for dm-verity users on affected systems.
      
      Fix this by using mult_frac() to correctly multiply by a percentage.  Do
      this for all places in dm-bufio that multiply by a percentage.  Also
      replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
      to the comment is now defined in include/linux/vmalloc.h.
      
      Depends-on: 9993bc63 ("sched/x86: Fix overflow in cyc2ns_offset")
      Fixes: 95d402f0 ("dm: add bufio")
      Cc: <stable@vger.kernel.org> # v3.2+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      74d4108d
    • M
      dm: clear all discard attributes in queue_limits when discards are disabled · 5d47c89f
      Mike Snitzer 提交于
      Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the
      various discard attributes (which get exposed via sysfs) may be set.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5d47c89f
    • M
      dm: do not set 'discards_supported' in targets that do not need it · 7dea378b
      Mike Snitzer 提交于
      The DM target's 'discards_supported' flag is intended to act as an
      override.  Meaning, even if the underlying storage doesn't support
      discards the DM target will.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7dea378b
    • M
      dm: discard support requires all targets in a table support discards · 8a74d29d
      Mike Snitzer 提交于
      A DM device with a mix of discard capabilities (due to some underlying
      devices not having discard support) _should_ just return -EOPNOTSUPP for
      the region of the device that doesn't support discards (even if only by
      way of the underlying driver formally not supporting discards).  BUT,
      that does ask the underlying driver to handle something that it never
      advertised support for.  In doing so we're exposing users to the
      potential for a underlying disk driver hanging if/when a discard is
      issued a the device that is incapable and never claimed to support
      discards.
      
      Fix this by requiring that each DM target in a DM table provide discard
      support as a prereq for a DM device to advertise support for discards.
      
      This may cause some configurations that were happily supporting discards
      (even in the face of a mix of discard support) to stop supporting
      discards -- but the risk of users hitting driver hangs, and forced
      reboots, outweighs supporting those fringe mixed discard
      configurations.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8a74d29d
    • M
      dm mpath: remove annoying message of 'blk_get_request() returned -11' · 9dc112e2
      Ming Lei 提交于
      It is very normal to see allocation failure, especially with blk-mq
      request_queues, so it's unnecessary to report this error and annoy
      people.
      
      In practice this 'blk_get_request() returned -11' error gets logged
      quite frequently when a blk-mq DM multipath device sees heavy IO.
      
      This change is marked for stable@ because the annoying message in
      question was included in stable@ commit 7083abbb.
      
      Fixes: 7083abbb ("dm mpath: avoid that path removal can trigger an infinite loop")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9dc112e2
  8. 15 11月, 2017 1 次提交
  9. 11 11月, 2017 20 次提交