1. 09 4月, 2017 1 次提交
  2. 11 6月, 2016 1 次提交
  3. 08 6月, 2016 2 次提交
  4. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  5. 23 8月, 2013 1 次提交
  6. 02 3月, 2013 1 次提交
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
  7. 22 12月, 2012 1 次提交
    • M
      dm kcopyd: add WRITE SAME support to dm_kcopyd_zero · 70d6c400
      Mike Snitzer 提交于
      Add WRITE SAME support to dm-io and make it accessible to
      dm_kcopyd_zero().  dm_kcopyd_zero() provides an asynchronous interface
      whereas the blkdev_issue_write_same() interface is synchronous.
      
      WRITE SAME is a SCSI command that can be leveraged for more efficient
      zeroing of a specified logical extent of a device which supports it.
      Only a single zeroed logical block is transfered to the target for each
      WRITE SAME and the target then writes that same block across the
      specified extent.
      
      The dm thin target uses this.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      70d6c400
  8. 01 11月, 2011 1 次提交
  9. 24 10月, 2011 1 次提交
  10. 02 8月, 2011 3 次提交
  11. 27 7月, 2011 1 次提交
  12. 29 5月, 2011 8 次提交
  13. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  14. 14 1月, 2011 4 次提交
    • T
      dm: use non reentrant workqueues if equivalent · 9c4376de
      Tejun Heo 提交于
      kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
      serve only a single work item from the dm instance, so non-reentrant
      workqueues would provide the same ordering guarantees as ordered ones
      while allowing CPU affinity and use of the workqueues for other
      purposes.  Switch them to non-reentrant workqueues.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9c4376de
    • T
      dm: convert workqueues to alloc_ordered · 4d4d66ab
      Tejun Heo 提交于
      Convert all create[_singlethread]_work() users to the new
      alloc[_ordered]_workqueue().  This conversion is mechanical and
      doesn't introduce any behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4d4d66ab
    • M
      dm kcopyd: delay unplugging · 8d35d3e3
      Mikulas Patocka 提交于
      Make kcopyd merge more I/O requests by using device unplugging.
      
      Without this patch, each I/O request is dispatched separately to the device.
      If the device supports tagged queuing, there are many small requests sent
      to the device. To improve performance, this patch will batch as many requests
      as possible, allowing the queue to merge consecutive requests, and send them
      to the device at once.
      
      In my tests (15k SCSI disk), this patch improves sequential write throughput:
      
        Sequential write throughput (chunksize of 4k, 32k, 512k)
        unpatched: 15.2, 18.5, 17.5 MB/s
        patched:   14.4, 22.6, 23.0 MB/s
      
      In most common uses (snapshot or two-way mirror), kcopyd is only used for
      two devices, one for reading and the other for writing, thus this optimization
      is implemented only for two devices. The optimization may be extended to n-way
      mirrors with some code complexity increase.
      
      We keep track of two block devices to unplug (one for read and the
      other for write) and unplug them when exiting "do_work" thread.  If
      there are more devices used (in theory it could happen, in practice it
      is rare), we unplug immediately.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8d35d3e3
    • M
      dm io: remove BIO_RW_SYNCIO flag from kcopyd · d9bf0b50
      Mikulas Patocka 提交于
      Remove the REQ_SYNC flag to improve write throughput when writing
      to the origin with a snapshot on the same device (using the CFQ I/O
      scheduler).
      
      Sequential write throughput (chunksize of 4k, 32k, 512k)
        unpatched:  8.5,  8.6,  9.3 MB/s
        patched:   15.2, 18.5, 17.5 MB/s
      
      Snapshot exception reallocations are triggered by writes that are
      usually async, so mark the associated dm_io_request as async as well.
      This helps when using the CFQ I/O scheduler because it has separate
      queues for sync and async I/O.  Async is optimized for throughput; sync
      for latency.  With this change we're consciously favoring throughput over
      latency.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d9bf0b50
  15. 08 8月, 2010 1 次提交
    • C
      block: unify flags for struct bio and struct request · 7b6d91da
      Christoph Hellwig 提交于
      Remove the current bio flags and reuse the request flags for the bio, too.
      This allows to more easily trace the type of I/O from the filesystem
      down to the block driver.  There were two flags in the bio that were
      missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
      renamed two request flags that had a superflous RW in them.
      
      Note that the flags are in bio.h despite having the REQ_ name - as
      blkdev.h includes bio.h that is the only way to go for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7b6d91da
  16. 11 12月, 2009 1 次提交
    • M
      dm kcopyd: accept zero size jobs · 9ca170a3
      Mikulas Patocka 提交于
      dm-kcopyd: accept zero-size jobs
      
      This patch changes dm-kcopyd so that it accepts zero-size jobs and completes
      them immediatelly via its completion thread.
      
      It is needed for multisnapshots snapshot resizing. When we are writing to
      a chunk beyond origin end, no copying is done. To simplify the code, we submit
      an empty request to kcopyd and let kcopyd complete it. If we didn't submit
      a request to kcopyd and called the completion routine immediatelly, it would
      violate the principle that completion is called only from one thread and
      it would need additional locking.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9ca170a3
  17. 09 4月, 2009 2 次提交
    • M
      dm kcopyd: fix callback race · 340cd444
      Mikulas Patocka 提交于
      If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
      split_job/segment_complete and the subjobs complete before the loop in
      split_job completes, the kcopyd callback could be invoked from the
      thread that called dm_kcopyd_copy instead of the kcopyd workqueue.
      
      dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()
      
      Snapshots depend on the fact that callbacks are called from the singlethreaded
      kcopyd workqueue and expect that there is no racing between individual
      callbacks. The racing between callbacks can lead to corruption of exception
      store and it can also mean that exception store callbacks are called twice
      for the same exception - a likely reason for crashes reported inside
      pending_complete() / remove_exception().
      
      This patch fixes two problems:
      
      1. job->fn being called from the thread that submitted the job (see above).
      
      - Fix: hand over the completion callback to the kcopyd thread.
      
      2. job->fn(read_err, write_err, job->context); in segment_complete
      reports the error of the last subjob, not the union of all errors.
      
      - Fix: pass job->write_err to the callback to report all error bits
        (it is done already in run_complete_job)
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      340cd444
    • M
      dm kcopyd: prepare for callback race fix · 73830857
      Mikulas Patocka 提交于
      Use a variable in segment_complete() to point to the dm_kcopyd_client
      struct and only release job->pages in run_complete_job() if any are
      defined.  These changes are needed by the next patch.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      73830857
  18. 18 2月, 2009 1 次提交
  19. 22 10月, 2008 2 次提交
    • M
      dm: remove dm header from targets · 586e80e6
      Mikulas Patocka 提交于
      Change #include "dm.h" to #include <linux/device-mapper.h> in all targets.
      Targets should not need direct access to internal DM structures.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      586e80e6
    • K
      dm kcopyd: avoid queue shuffle · b673c3a8
      Kazuo Ito 提交于
      Write throughput to LVM snapshot origin volume is an order
      of magnitude slower than those to LV without snapshots or
      snapshot target volumes, especially in the case of sequential
      writes with O_SYNC on.
      
      The following patch originally written by Kevin Jamieson and
      Jan Blunck and slightly modified for the current RCs by myself
      tries to improve the performance by modifying the behaviour
      of kcopyd, so that it pushes back an I/O job to the head of
      the job queue instead of the tail as process_jobs() currently
      does when it has to wait for free pages. This way, write
      requests aren't shuffled to cause extra seeks.
      
      I tested the patch against 2.6.27-rc5 and got the following results.
      The test is a dd command writing to snapshot origin followed by fsync
      to the file just created/updated.  A couple of filesystem benchmarks
      gave me similar results in case of sequential writes, while random
      writes didn't suffer much.
      
      dd if=/dev/zero of=<somewhere on snapshot origin> bs=4096 count=...
         [conv=notrunc when updating]
      
      1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
      average throughput (MB/s)
                           10M     100M    1000M
      create,dd         511.46   610.72    11.81
      create,dd+fsync     7.10     6.77     8.13
      update,dd         431.63   917.41    12.75
      update,dd+fsync     7.79     7.43     8.12
      
      compared with write throughput to LV without any snapshots,
      all dd+fsync and 1000 MiB writes perform very poorly.
      
                           10M     100M    1000M
      create,dd         555.03   608.98   123.29
      create,dd+fsync   114.27    72.78    76.65
      update,dd         152.34  1267.27   124.04
      update,dd+fsync   130.56    77.81    77.84
      
      2) linux 2.6.27-rc5 with the patch, write to snapshot origin,
      average throughput (MB/s)
      
                           10M     100M    1000M
      create,dd         537.06   589.44    46.21
      create,dd+fsync    31.63    29.19    29.23
      update,dd         487.59   897.65    37.76
      update,dd+fsync    34.12    30.07    26.85
      
      Although still not on par with plain LV performance -
      cannot be avoided because it's copy on write anyway -
      this simple patch successfully improves throughtput
      of dd+fsync while not affecting the rest.
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Signed-off-by: NKazuo Ito <ito.kazuo@oss.ntt.co.jp>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: stable@kernel.org
      b673c3a8
  20. 25 4月, 2008 5 次提交
    • M
      dm: unplug queues in threads · 7ff14a36
      Mikulas Patocka 提交于
      Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O.
      
      It is specified that any submitted bio without BIO_RW_SYNC flag may plug the
      queue (i.e. block the requests from being dispatched to the physical device).
      
      The queue is unplugged when the caller calls blk_unplug() function. Usually, the
      sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs
      the queue and waits (to be possibly joined with other adjacent bios). Then, when
      the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to
      the disk.
      
      This was happenning:
      
      When doing O_SYNC writes, function fsync_buffers_list() submits a list of
      bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is
      woken up.
      
      fsync_buffers_list() calls wait_on_buffer().  That unplugs the queue, but
      there are no bios on the device queue as they are still in the dm_raid1 queue.
      
      wait_on_buffer() starts waiting until the IO is finished.
      
      kmirrord is scheduled, kmirrord takes bios and submits them to the devices.
      
      The submitted bio plugs the harddisk queue but there is no one to unplug it.
      (The process that called wait_on_buffer() is already sleeping.)
      
      So there is a 3ms timeout, after which the queues on the harddisks are
      unplugged and requests are processed.
      
      This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes),
      dm-raid1 is 10 times slower than md raid1.
      
      Every time we submit something asynchronously via dm_io, we must unplug the
      queue actually to send the request to the device.
      
      This patch adds an unplug call to kmirrord - while processing requests, it keeps
      the queue plugged (so that adjacent bios can be merged); when it finishes
      processing all the bios, it unplugs the queue to submit the bios.
      
      It also fixes kcopyd which has the same potential problem. All kcopyd requests
      are submitted with BIO_RW_SYNC.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      7ff14a36
    • A
      dm: move include files · a765e20e
      Alasdair G Kergon 提交于
      Publish the dm-io, dm-log and dm-kcopyd headers in include/linux.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a765e20e
    • A
      dm kcopyd: rename · 2d1e580a
      Alasdair G Kergon 提交于
      Rename kcopyd.[ch] to dm-kcopyd.[ch].
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2d1e580a
    • M
      dm kcopyd: remove redundant client counting · 945fa4d2
      Mikulas Patocka 提交于
      Remove client counting code that is no longer needed.
      
      Initialization and destruction is made globally from dm_init and dm_exit and is
      not based on client counts. Initialization allocates only one empty slab cache,
      so there is no negative impact from performing the initialization always,
      regardless of whether some client uses kcopyd or not.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      945fa4d2
    • M
      dm kcopyd: private mempool · 08d8757a
      Mikulas Patocka 提交于
      Change the global mempool in kcopyd into a per-device mempool to avoid
      deadlock possibilities.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      08d8757a