1. 10 5月, 2014 2 次提交
  2. 19 4月, 2014 1 次提交
    • T
      Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt · 8f28ed92
      Tang Chen 提交于
      In document numa_memory_policy.txt, the following examples for flag
      MPOL_F_RELATIVE_NODES are incorrect.
      
      	For example, consider a task that is attached to a cpuset with
      	mems 2-5 that sets an Interleave policy over the same set with
      	MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
      	interleave now occurs over nodes 3,5-6.  If the cpuset's mems
      	then change to 0,2-3,5, then the interleave occurs over nodes
      	0,3,5.
      
      According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
      the nodemasks the user specifies should be considered relative to the
      current task's mems_allowed.
      
       (https://lkml.org/lkml/2008/2/29/428)
      
      And according to numa_memory_policy.txt, if the user's nodemask includes
      nodes that are outside the range of the new set of allowed nodes, then
      the remap wraps around to the beginning of the nodemask and, if not
      already set, sets the node in the mempolicy nodemask.
      
      So in the example, if the user specifies 2-5, for a task whose
      mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
      fifth, sixth node in mems_allowed.  like the following:
      
      	mems_allowed:       3  4  5  6  7
      
      	relative index:     0  1  2  3  4
      	                    5
      
      So the nodemasks should be remapped to 3,5-7, but not 3,5-6.
      
      And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
      remapped to 0,2-3,5, but not 0,3,5.
      
      	mems_allowed:       0  2  3  5
      
              relative index:     0  1  2  3
                                  4  5
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f28ed92
  3. 18 4月, 2014 1 次提交
    • D
      drm: Split out drm_probe_helper.c from drm_crtc_helper.c · 8d754544
      Daniel Vetter 提交于
      This is leftover stuff from my previous doc round which I kinda wanted
      to do but didn't yet due to rebase hell.
      
      The modeset helpers and the probing helpers a independent and e.g.
      i915 uses the probing stuff but has its own modeset infrastructure. It
      hence makes to split this up. While at it add a DOC: comment for the
      probing libraray.
      
      It would be rather neat to pull some of the DocBook documenting these
      two helpers into in-line DOC: comments. But unfortunately kerneldoc
      doesn't support markdown or something similar to make nice-looking
      documentation, so the current state is better.
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      8d754544
  4. 17 4月, 2014 8 次提交
  5. 15 4月, 2014 8 次提交
  6. 14 4月, 2014 1 次提交
  7. 10 4月, 2014 1 次提交
  8. 08 4月, 2014 18 次提交
    • D
      9a6adb33
    • M
      doc/kernel-parameters.txt: add early_ioremap_debug · 56aeeba8
      Mark Salter 提交于
      Add description of early_ioremap_debug kernel parameter.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56aeeba8
    • M
      arm64: add early_ioremap support · bf4b558e
      Mark Salter 提交于
      Add support for early IO or memory mappings which are needed before the
      normal ioremap() is usable.  This also adds fixmap support for permanent
      fixed mappings such as that used by the earlyprintk device register
      region.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf4b558e
    • D
      asm/system.h: clean asm/system.h from docs · 95663285
      David Howells 提交于
      Clean asm/system.h from docs as nothing should refer to that header anymore.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95663285
    • J
      kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT · 5d2acfc7
      Josh Triplett 提交于
      "make allnoconfig" exists to ease testing of minimal configurations.
      Documentation/SubmitChecklist includes a note to test with allnoconfig.
      This helps catch missing dependencies on common-but-not-required
      functionality, which might otherwise go unnoticed.
      
      However, allnoconfig still leaves many symbols enabled, because they're
      hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT.  For instance, allnoconfig
      still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
      typically get build-tested with those disabled.
      
      To address this, introduce a new Kconfig option "allnoconfig_y", used on
      symbols which only exist to hide other symbols.  Set it on CONFIG_EMBEDDED
      (which then selects CONFIG_EXPERT).  allnoconfig will then disable all the
      symbols hidden behind those.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d2acfc7
    • F
      affs: add mount option to avoid filename truncates · 8ca57722
      Fabian Frederick 提交于
      Normal behavior for filenames exceeding specific filesystem limits is to
      refuse operation.
      
      AFFS standard name length being only 30 characters against 255 for usual
      Linux filesystems, original implementation does filename truncate by
      default with a define value AFFS_NO_TRUNCATE which can be enabled but
      needs module compilation.
      
      This patch adds 'nofilenametruncate' mount option so that user can
      easily activate that feature and avoid a lot of problems (eg overwrite
      files ...)
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ca57722
    • L
      hung_task: check the value of "sysctl_hung_task_timeout_sec" · 80df2847
      Liu Hua 提交于
      As sysctl_hung_task_timeout_sec is unsigned long, when this value is
      larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
      watchdog will return immediately without sleep and with print :
      
        schedule_timeout: wrong timeout value ffffffffffffff83
      
      and then the funtion watchdog will call schedule_timeout_interruptible
      again and again.  The screen will be filled with
      
      	"schedule_timeout: wrong timeout value ffffffffffffff83"
      
      This patch does some check and correction in sysctl, to let the function
      schedule_timeout_interruptible allways get the valid parameter.
      Signed-off-by: NLiu Hua <sdu.liu@huawei.com>
      Tested-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80df2847
    • A
      rapidio: rework device hierarchy and introduce mport class of devices · 2aaf308b
      Alexandre Bounine 提交于
      This patch removes an artificial RapidIO bus root device and establishes
      actual device hierarchy by providing reference to real parent devices.
      It also introduces device class for RapidIO controller devices (on-chip
      or an eternal bridge, known as "mport").
      
      Existing implementation was sufficient for SoC-based platforms that have
      a single RapidIO controller.  With introduction of devices using
      multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
      is very limiting or does not work at all.  The implemented changes allow
      to properly reference platform's local RapidIO mport devices and provide
      device details needed for upper layers.
      
      This change to RapidIO device hierarchy does not break any known
      existing kernel or user space interfaces.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
      Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aaf308b
    • A
      proc: show mnt_id in /proc/pid/fdinfo · 49d063cb
      Andrey Vagin 提交于
      Currently we don't have a way how to determing from which mount point
      file has been opened.  This information is required for proper dumping
      and restoring file descriptos due to presence of mount namespaces.  It's
      possible, that two file descriptors are opened using the same paths, but
      one fd references mount point from one namespace while the other fd --
      from other namespace.
      
      $ ls -l /proc/1/fd/1
      lrwx------ 1 root root 64 Mar 19 23:54 /proc/1/fd/1 -> /dev/null
      
      $ cat /proc/1/fdinfo/1
      pos:	0
      flags:	0100002
      mnt_id:	16
      
      $ cat /proc/1/mountinfo | grep ^16
      16 32 0:4 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=1013356k,nr_inodes=253339,mode=755
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49d063cb
    • M
      zram: propagate error to user · 60a726e3
      Minchan Kim 提交于
      When we initialized zcomp with single, we couldn't change
      max_comp_streams without zram reset but current interface doesn't show
      any error to user and even it changes max_comp_streams's value without
      any effect so it would make user very confusing.
      
      This patch prevents max_comp_streams's change when zcomp was initialized
      as single zcomp and emit the error to user(ex, echo).
      
      [akpm@linux-foundation.org: don't return with the lock held, per Sergey]
      [fengguang.wu@intel.com: fix coccinelle warnings]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60a726e3
    • S
      zram: make compression algorithm selection possible · e46b8a03
      Sergey Senozhatsky 提交于
      Add and document `comp_algorithm' device attribute.  This attribute allows
      to show supported compression and currently selected compression
      algorithms:
      
      	cat /sys/block/zram0/comp_algorithm
      	[lzo] lz4
      
      and change selected compression algorithm:
      	echo lzo > /sys/block/zram0/comp_algorithm
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e46b8a03
    • S
      zram: add multi stream functionality · beca3ec7
      Sergey Senozhatsky 提交于
      Existing zram (zcomp) implementation has only one compression stream
      (buffer and algorithm private part), so in order to prevent data
      corruption only one write (compress operation) can use this compression
      stream, forcing all concurrent write operations to wait for stream lock
      to be released.  This patch changes zcomp to keep a compression streams
      list of user-defined size (via sysfs device attr).  Each write operation
      still exclusively holds compression stream, the difference is that we
      can have N write operations (depending on size of streams list)
      executing in parallel.  See TEST section later in commit message for
      performance data.
      
      Introduce struct zcomp_strm_multi and a set of functions to manage
      zcomp_strm stream access.  zcomp_strm_multi has a list of idle
      zcomp_strm structs, spinlock to protect idle list and wait queue, making
      it possible to perform parallel compressions.
      
      The following set of functions added:
      - zcomp_strm_multi_find()/zcomp_strm_multi_release()
        find and release a compression stream, implement required locking
      - zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
        create and destroy zcomp_strm_multi
      
      zcomp ->strm_find() and ->strm_release() callbacks are set during
      initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
      correspondingly.
      
      Each time zcomp issues a zcomp_strm_multi_find() call, the following set
      of operations performed:
      
      - spin lock strm_lock
      - if idle list is not empty, remove zcomp_strm from idle list, spin
        unlock and return zcomp stream pointer to caller
      - if idle list is empty, current adds itself to wait queue. it will be
        awaken by zcomp_strm_multi_release() caller.
      
      zcomp_strm_multi_release():
      - spin lock strm_lock
      - add zcomp stream to idle list
      - spin unlock, wake up sleeper
      
      Minchan Kim reported that spinlock-based locking scheme has demonstrated
      a severe perfomance regression for single compression stream case,
      comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
      
      base                      spinlock                    mutex
      
      ==Initial write           ==Initial write             ==Initial  write
      records:  5               records:  5                 records:   5
      avg:      1642424.35      avg:      699610.40         avg:       1655583.71
      std:      39890.95(2.43%) std:      232014.19(33.16%) std:       52293.96
      max:      1690170.94      max:      1163473.45        max:       1697164.75
      min:      1568669.52      min:      573429.88         min:       1553410.23
      ==Rewrite                 ==Rewrite                   ==Rewrite
      records:  5               records:  5                 records:   5
      avg:      1611775.39      avg:      501406.64         avg:       1684419.11
      std:      17144.58(1.06%) std:      15354.41(3.06%)   std:       18367.42
      max:      1641800.95      max:      531356.78         max:       1706445.84
      min:      1593515.27      min:      488817.78         min:       1655335.73
      
      When only one compression stream available, mutex with spin on owner
      tends to perform much better than frequent wait_event()/wake_up().  This
      is why single stream implemented as a special case with mutex locking.
      
      Introduce and document zram device attribute max_comp_streams.  This
      attr shows and stores current zcomp's max number of zcomp streams
      (max_strm).  Extend zcomp's zcomp_create() with `max_strm' parameter.
      `max_strm' limits the number of zcomp_strm structs in compression
      backend's idle list (max_comp_streams).
      
      max_comp_streams used during initialisation as follows:
      -- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
      using single compression stream zcomp_strm_single (mutex-based locking).
      -- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
      using multi compression stream zcomp_strm_multi (spinlock-based locking).
      
      default max_comp_streams value is 1, meaning that zram with single stream
      will be initialised.
      
      Later patch will introduce configuration knob to change max_comp_streams
      on already initialised and used zcomp.
      
      TEST
      iozone -t 3 -R -r 16K -s 60M -I +Z
      
             test           base       1 strm (mutex)     3 strm (spinlock)
      -----------------------------------------------------------------------
       Initial write      589286.78       583518.39          718011.05
             Rewrite      604837.97       596776.38         1515125.72
        Random write      584120.11       595714.58         1388850.25
              Pwrite      535731.17       541117.38          739295.27
              Fwrite     1418083.88      1478612.72         1484927.06
      
      Usage example:
      set max_comp_streams to 4
              echo 4 > /sys/block/zram0/max_comp_streams
      
      show current max_comp_streams (default value is 1).
              cat /sys/block/zram0/max_comp_streams
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      beca3ec7
    • S
      zram: document failed_reads, failed_writes stats · 8dd1d324
      Sergey Senozhatsky 提交于
      Document `failed_reads' and `failed_writes' device attributes.
      Remove info about `discard' - there is no such zram attr.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dd1d324
    • S
      zram: move zram size warning to documentation · e64cd51d
      Sergey Senozhatsky 提交于
      Move zram warning about disksize and size of memory correlation to zram
      documentation.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e64cd51d
    • M
      memcg: rename high level charging functions · d715ae08
      Michal Hocko 提交于
      mem_cgroup_newpage_charge is used only for charging anonymous memory so
      it is better to rename it to mem_cgroup_charge_anon.
      
      mem_cgroup_cache_charge is used for file backed memory so rename it to
      mem_cgroup_charge_file.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d715ae08
    • D
      res_counter: remove interface for locked charging and uncharging · 539a13b4
      David Rientjes 提交于
      The res_counter_{charge,uncharge}_locked() variants are not used in the
      kernel outside of the resource counter code itself, so remove the
      interface.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      539a13b4
    • K
      mm: introduce vm_ops->map_pages() · 8c6e50b0
      Kirill A. Shutemov 提交于
      Here's new version of faultaround patchset.  It took a while to tune it
      and collect performance data.
      
      First patch adds new callback ->map_pages to vm_operations_struct.
      
      ->map_pages() is called when VM asks to map easy accessible pages.
      Filesystem should find and map pages associated with offsets from
      "pgoff" till "max_pgoff".  ->map_pages() is called with page table
      locked and must not block.  If it's not possible to reach a page without
      blocking, filesystem should skip it.  Filesystem should use do_set_pte()
      to setup page table entry.  Pointer to entry associated with offset
      "pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
      entries for other offsets should be calculated relative to "pte".
      
      Currently VM use ->map_pages only on read page fault path.  We try to
      map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
      Performance data for different FAULT_AROUND_ORDER is below.
      
      TODO:
       - implement ->map_pages() for shmem/tmpfs;
       - modify get_user_pages() to be able to use ->map_pages() and implement
         mmap(MAP_POPULATE|MAP_NONBLOCK) on top.
      
      =========================================================================
      Tested on 4-socket machine (120 threads) with 128GiB of RAM.
      
      Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
      somewhere between 3 and 5. Let's say 4 :)
      
      Linux build (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
      	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
      Linux rebuild (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
      	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
      Git test suite (make -j60 test)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
      	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413
      
      Two synthetic tests: access every word in file in sequential/random order.
      It doesn't improve much after FAULT_AROUND_ORDER == 4.
      
      Sequential access 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
      	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
       8 threads
      	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
      	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
       32 threads
      	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
      	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
       60 threads
      	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
      	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
       120 threads
      	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
      	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
      Random access 1GiB file
       1 thread
      	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
      	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
       8 threads
      	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
      	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
       32 threads
      	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
      	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
       60 threads
      	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
      	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
       120 threads
      	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
      	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594
      
      Touch only one page in page table in 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
      	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
       8 threads
      	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
      	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
       32 threads
      	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
      	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
       60 threads
      	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
      	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
       120 threads
      	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
      	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125
      
      This patch (of 2):
      
      Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
      accessible pages around fault address.
      
      On read page fault, if filesystem provides ->map_pages(), we try to map up
      to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
      number of minor page faults.
      
      We call ->map_pages first and use ->fault() as fallback if page by the
      offset is not ready to be mapped (cold page cache or something).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ning Qu <quning@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c6e50b0
    • A
      sched: remove sleep_on() and friends · b8780c36
      Arnd Bergmann 提交于
      This is the final piece in the puzzle, as all patches to remove the
      last users of \(interruptible_\|\)sleep_on\(_timeout\|\) have made it
      into the 3.15 merge window. The work was long overdue, and this
      interface in particular should not have survived the BKL removal
      that was done a couple of years ago.
      
      Citing Jon Corbet from http://lwn.net/2001/0201/kernel.php3":
      
       "[...] it was suggested that the janitors look for and fix all code
        that calls sleep_on() [...] since (1) almost all such code is
        incorrect, and (2) Linus has agreed that those functions should
        be removed in the 2.5 development series".
      
      We haven't quite made it for 2.5, but maybe we can merge this for 3.15.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8780c36