1. 23 4月, 2015 1 次提交
  2. 22 4月, 2015 1 次提交
  3. 20 4月, 2015 3 次提交
  4. 17 4月, 2015 9 次提交
  5. 16 4月, 2015 24 次提交
    • M
      Doc: dt: arch_timer: discourage clock-frequency use · 4155fc07
      Mark Rutland 提交于
      The ARM Generic Timer (AKA the architected timer, arm_arch_timer)
      features a CPU register (CNTFRQ) which firmware is intended to
      initialize, and non-secure software can read to determine the frequency
      of the timer. On CPUs with secure state, this register cannot be written
      from non-secure states.
      
      The firmware of early SoCs featuring the timer did not correctly
      initialize CNTFRQ correctly on all CPUs, requiring the frequency to be
      described in DT as a workaround. This workaround is not complete however
      as it is exposed to all software in a privileged non-secure mode
      (including guests running under a hypervisor). The firmware and DTs for
      recent SoCs have followed the example set by these early SoCs.
      
      This patch updates the arch timer binding documentation to make it
      clearer that the use of the clock-frequency property is a poor
      work-around. The MMIO generic timer binding is similarly updated, though
      this is less of a concern as there is generally no need to expose the
      MMIO timers to guest OSs.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      4155fc07
    • G
      lib/vsprintf: add %pC{,n,r} format specifiers for clocks · 900cca29
      Geert Uytterhoeven 提交于
      Add format specifiers for printing struct clk:
        - '%pC' or '%pCn': name (Common Clock Framework) or address (legacy
          clock framework) of the clock,
        - '%pCr': rate of the clock.
      
      [akpm@linux-foundation.org: omit code if !CONFIG_HAVE_CLK]
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Turquette <mturquette@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      900cca29
    • G
      lib/vsprintf: Move integer format types to the top · e8a7ba5f
      Geert Uytterhoeven 提交于
      Move the format types for 64-bit integers and configurable size integers
      to the top, so they're next to the other integer format types.  While at
      it, add the missing format types for s32 and u32.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Turquette <mturquette@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8a7ba5f
    • G
      lib/vsprintf: document %p parameters passed by reference · 7330660e
      Geert Uytterhoeven 提交于
      This patch series improves the documentation for printk() formats, and
      adds support for printing clocks.  The latter has always been a hassle if
      you wanted to support both the common and legacy clock frameworks.
      
        - '%pC' and '%pCn' print the name (Common Clock Framework) or address
          (legacy clock framework) of a clock,
        - '%pCr' prints the current clock rate.
      
      This patch (of 3):
      
      Make sure all %p extensions that take parameters by references are
      documented to do so.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Turquette <mturquette@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7330660e
    • S
      zram: deprecate zram attrs sysfs nodes · 8f7d282c
      Sergey Senozhatsky 提交于
      Add Documentation/ABI/obsolete/sysfs-block-zram file and list obsolete and
      deprecated attributes there.  The patch also adds additional information
      to zram documentation and describes the basic strategy:
      
      - the existing RW nodes will be downgraded to WO nodes (in 4.11)
      - deprecated RO sysfs nodes will eventually be removed (in 4.11)
      
      Users will be additionally notified about deprecated attr usage by
      pr_warn_once() (added to every deprecated attr _show()), as suggested by
      Minchan Kim.
      
      User space is advised to use zram<id>/stat, zram<id>/io_stat and
      zram<id>/mm_stat files.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f7d282c
    • S
      zram: export new 'mm_stat' sysfs attrs · 4f2109f6
      Sergey Senozhatsky 提交于
      Per-device `zram<id>/mm_stat' file provides mm statistics of a particular
      zram device in a format similar to block layer statistics.  The file
      consists of a single line and represents the following stats (separated by
      whitespace):
      
              orig_data_size
              compr_data_size
              mem_used_total
              mem_limit
              mem_used_max
              zero_pages
              num_migrated
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f2109f6
    • S
      zram: export new 'io_stat' sysfs attrs · 2f6a3bed
      Sergey Senozhatsky 提交于
      Per-device `zram<id>/io_stat' file provides accumulated I/O statistics of
      particular zram device in a format similar to block layer statistics.  The
      file consists of a single line and represents the following stats
      (separated by whitespace):
      
              failed_reads
              failed_writes
              invalid_io
              notify_free
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f6a3bed
    • S
      zram: describe device attrs in documentation · 77ba015f
      Sergey Senozhatsky 提交于
      Briefly describe exported device stat attrs in zram documentation.  We
      will eventually get rid of per-stat sysfs nodes and, thus, clean up
      Documentation/ABI/testing/sysfs-block-zram file, which is the only source
      of information about device sysfs nodes.
      
      Add `num_migrated' description, since there is no independent
      `num_migrated' sysfs node (and no corresponding sysfs-block-zram entry),
      it will be exported via zram<id>/mm_stat file.
      
      At this point we can provide minimal description, because sysfs-block-zram
      still contains detailed information.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77ba015f
    • S
      zram: remove `num_migrated' device attr · 10447b60
      Sergey Senozhatsky 提交于
      This patch introduces rework to zram stats.  We have per-stat sysfs nodes,
      and it makes things a bit hard to use in user space: it doesn't give an
      immediate stats 'snapshot', it requires user space to use more syscalls -
      open, read, close for every stat file, with appropriate error checks on
      every step, etc.
      
      First, zram now accounts block layer statistics, available in
      /sys/block/zram<id>/stat and /proc/diskstats files.  So some new stats are
      available (see Documentation/block/stat.txt), besides, zram's activities
      now can be monitored by sysstat's iostat or similar tools.
      
      Example:
      cat /sys/block/zram0/stat
      248     0    1984    0   251029     0  2008232   5120   0   5116   5116
      
      Second, group currently exported on per-stat basis nodes into two
      categories (files):
      
      -- zram<id>/io_stat
      accumulates device's IO stats, that are not accounted by block layer,
      and contains:
              failed_reads
              failed_writes
              invalid_io
              notify_free
      
      Example:
      cat /sys/block/zram0/io_stat
      0        0        0   652572
      
      -- zram<id>/mm_stat
      accumulates zram mm stats and contains:
              orig_data_size
              compr_data_size
              mem_used_total
              mem_limit
              mem_used_max
              zero_pages
              num_migrated
      
      Example:
      cat /sys/block/zram0/mm_stat
      434634752 270288572 279158784        0 579895296    15060        0
      
      per-stat sysfs nodes are now considered to be deprecated and we plan to
      remove them (and clean up some of the existing stat code) in two years (as
      of now, there is no warning printed to syslog about deprecated stats being
      used).  User space is advised to use the above mentioned 3 files.
      
      This patch (of 7):
      
      Remove sysfs `num_migrated' attribute.  We are moving away from per-stat
      device attrs towards 3 stat files that will accumulate io and mm stats in
      a format similar to block layer statistics in /sys/block/<dev>/stat.  That
      will be easier to use in user space, and reduce the number of syscalls
      needed to read zram device statistics.
      
      `num_migrated' will return back in zram<id>/mm_stat file.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10447b60
    • M
      zsmalloc: zsmalloc documentation · d02be50d
      Minchan Kim 提交于
      Create zsmalloc doc which explains design concept and stat information.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Gunho Lee <gunho.lee@lge.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02be50d
    • M
      zram: support compaction · 4e3ba878
      Minchan Kim 提交于
      Now that zsmalloc supports compaction, zram can use it.  For the first
      step, this patch exports compact knob via sysfs so user can do compaction
      via "echo 1 > /sys/block/zram0/compact".
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Gunho Lee <gunho.lee@lge.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e3ba878
    • B
      mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP · dd906184
      Boaz Harrosh 提交于
      This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
      get notified when access is a write to a read-only PFN.
      
      This can happen if we mmap() a file then first mmap-read from it to
      page-in a read-only PFN, than we mmap-write to the same page.
      
      We need this functionality to fix a DAX bug, where in the scenario above
      we fail to set ctime/mtime though we modified the file.  An xfstest is
      attached to this patchset that shows the failure and the fix.  (A DAX
      patch will follow)
      
      This functionality is extra important for us, because upon dirtying of a
      pmem page we also want to RDMA the page to a remote cluster node.
      
      We define a new pfn_mkwrite and do not reuse page_mkwrite because
        1 - The name ;-)
        2 - But mainly because it would take a very long and tedious
            audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
            users. To make sure they do not now CRASH. For example current
            DAX code (which this is for) would crash.
            If we would want to reuse page_mkwrite, We will need to first
            patch all users, so to not-crash-on-no-page. Then enable this
            patch. But even if I did that I would not sleep so well at night.
            Adding a new vector is the safest thing to do, and is not that
            expensive. an extra pointer at a static function vector per driver.
            Also the new vector is better for performance, because else we
            Will call all current Kernel vectors, so to:
              check-ha-no-page-do-nothing and return.
      
      No need to call it from do_shared_fault because do_wp_page is called to
      change pte permissions anyway.
      Signed-off-by: NYigal Korman <yigal@plexistor.com>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd906184
    • D
      mm, doc: cleanup and clarify munmap behavior for hugetlb memory · 80d6b94b
      David Rientjes 提交于
      munmap(2) of hugetlb memory requires a length that is hugepage aligned,
      otherwise it may fail.  Add this to the documentation.
      
      This also cleans up the documentation and separates it into logical units:
      one part refers to MAP_HUGETLB and another part refers to requirements for
      shared memory segments.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80d6b94b
    • M
      hugetlbfs: document min_size mount option and cleanup · 8c9b9703
      Mike Kravetz 提交于
      Add min_size mount option to the hugetlbfs documentation.  Also, add the
      missing pagesize option and mention that size can be specified as bytes or
      a percentage of huge page pool.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c9b9703
    • E
      Documentation/vm/unevictable-lru.txt: document interaction between compaction... · 922c0551
      Eric B Munson 提交于
      Documentation/vm/unevictable-lru.txt: document interaction between compaction and the unevictable LRU
      
      The memory compaction code uses the migration code to do most of the
      work in compaction.  However, the compaction code interacts with the
      unevictable LRU differently than migration code and this difference
      should be noted in the documentation.
      
      [akpm@linux-foundation.org: identify /proc/sys/vm/compact_unevictable directly]
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      922c0551
    • E
      mm: allow compaction of unevictable pages · 5bbe3547
      Eric B Munson 提交于
      Currently, pages which are marked as unevictable are protected from
      compaction, but not from other types of migration.  The POSIX real time
      extension explicitly states that mlock() will prevent a major page
      fault, but the spirit of this is that mlock() should give a process the
      ability to control sources of latency, including minor page faults.
      However, the mlock manpage only explicitly says that a locked page will
      not be written to swap and this can cause some confusion.  The
      compaction code today does not give a developer who wants to avoid swap
      but wants to have large contiguous areas available any method to achieve
      this state.  This patch introduces a sysctl for controlling compaction
      behavior with respect to the unevictable lru.  Users who demand no page
      faults after a page is present can set compact_unevictable_allowed to 0
      and users who need the large contiguous areas can enable compaction on
      locked memory by leaving the default value of 1.
      
      To illustrate this problem I wrote a quick test program that mmaps a
      large number of 1MB files filled with random data.  These maps are
      created locked and read only.  Then every other mmap is unmapped and I
      attempt to allocate huge pages to the static huge page pool.  When the
      compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
      after fragmenting memory.  When the value is set to 1, allocations
      succeed.
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bbe3547
    • S
      rdma: replace deprecated ifconfig in doc · b962dc0a
      Stephen Hemminger 提交于
      The ifconfig command has been deprecated for many years.
      To encourage new users not to continue using it and learning
      iproute2; the ifconfig should not be used in examples.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b962dc0a
    • H
      Input: alps - document separate pointstick button bits for V2 devices · 2310568f
      Hans de Goede 提交于
      Non interleaved dualpoint v2 devices have separate pointstick button bits,
      document this.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      2310568f
    • M
      dm crypt: update URLs to new cryptsetup project page · e44f23b3
      Milan Broz 提交于
      Cryptsetup home page moved to GitLab.
      Also remove link to abandonded Truecrypt page.
      Signed-off-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e44f23b3
    • J
      dm: add log writes target · 0e9cebe7
      Josef Bacik 提交于
      Introduce a new target that is meant for file system developers to test file
      system integrity at particular points in the life of a file system.  We capture
      all write requests and associated data and log them to a separate device
      for later replay.  There is a userspace utility to do this replay.  The
      idea behind this is to give file system developers a tool to verify that
      the file system is always consistent.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NZach Brown <zab@zabbo.net>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0e9cebe7
    • S
      dm verity: add error handling modes for corrupted blocks · 65ff5b7d
      Sami Tolvanen 提交于
      Add device specific modes to dm-verity to specify how corrupted
      blocks should be handled.  The following modes are defined:
      
        - DM_VERITY_MODE_EIO is the default behavior, where reading a
          corrupted block results in -EIO.
      
        - DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does
          not block the read.
      
        - DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted
          block is discovered.
      
      In addition, each mode sends a uevent to notify userspace of
      corruption and to allow further recovery actions.
      
      The driver defaults to previous behavior (DM_VERITY_MODE_EIO)
      and other modes can be enabled with an additional parameter to
      the verity table.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      65ff5b7d
    • M
      dm thin: remove stale 'trim' message documentation · 0e0e32c1
      Mike Snitzer 提交于
      The 'trim' message wasn't ever implemented.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0e0e32c1
    • M
      dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8
      Mike Snitzer 提交于
      Request-based DM's blk-mq support defaults to off; but a user can easily
      change the default using the dm_mod.use_blk_mq module/boot option.
      
      Also, you can check what mode a given request-based DM device is using
      with: cat /sys/block/dm-X/dm/use_blk_mq
      
      This change enabled further cleanup and reduced work (e.g. the
      md->io_pool and md->rq_pool isn't created if using blk-mq).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      17e149b8
    • M
      dm: impose configurable deadline for dm_request_fn's merge heuristic · 0ce65797
      Mike Snitzer 提交于
      Otherwise, for sequential workloads, the dm_request_fn can allow
      excessive request merging at the expense of increased service time.
      
      Add a per-device sysfs attribute to allow the user to control how long a
      request, that is a reasonable merge candidate, can be queued on the
      request queue.  The resolution of this request dispatch deadline is in
      microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
        echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline
      
      The dm_request_fn's merge heuristic and associated extra accounting is
      disabled by default (rq_based_seq_io_merge_deadline is 0).
      
      This sysfs attribute is not applicable to bio-based DM devices so it
      will only ever report 0 for them.
      
      By allowing a request to remain on the queue it will block others
      requests on the queue.  But introducing a short dequeue delay has proven
      very effective at enabling certain sequential IO workloads on really
      fast, yet IOPS constrained, devices to build up slightly larger IOs --
      yielding 90+% throughput improvements.  Having precise control over the
      time taken to wait for larger requests to build affords control beyond
      that of waiting for certain IO sizes to accumulate (which would require
      a deadline anyway).  This knob will only ever make sense with sequential
      IO workloads and the particular value used is storage configuration
      specific.
      
      Given the expected niche use-case for when this knob is useful it has
      been deemed acceptable to expose this relatively crude method for
      crafting optimal IO on specific storage -- especially given the solution
      is simple yet effective.  In the context of DM multipath, it is
      advisable to tune this sysfs attribute to a value that offers the best
      performance for the common case (e.g. if 4 paths are expected active,
      tune for that; if paths fail then performance may be slightly reduced).
      
      Alternatives were explored to have request-based DM autotune this value
      (e.g. if/when paths fail) but they were quickly deemed too fragile and
      complex to warrant further design and development time.  If this problem
      proves more common as faster storage emerges we'll have to look at
      elevating a generic solution into the block core.
      Tested-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0ce65797
  6. 15 4月, 2015 2 次提交