1. 20 11月, 2014 1 次提交
  2. 24 10月, 2014 2 次提交
  3. 16 10月, 2014 1 次提交
  4. 14 10月, 2014 1 次提交
  5. 09 10月, 2014 1 次提交
  6. 08 10月, 2014 3 次提交
  7. 26 9月, 2014 3 次提交
  8. 24 9月, 2014 1 次提交
  9. 16 9月, 2014 1 次提交
    • J
      f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02
      Jaegeuk Kim 提交于
      If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
      only starts to try in-place-updates.
      And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
      keeps out-of-order manner. Otherwise, it triggers in-place-updates.
      
      This may be used by storage showing very high random write performance.
      
      For example, it can be used when,
      
      Seq. writes (Data) + wait + Seq. writes (Node)
      
      is pretty much slower than,
      
      Rand. writes (Data)
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c1ce1b02
  10. 08 9月, 2014 2 次提交
  11. 14 8月, 2014 1 次提交
  12. 08 8月, 2014 2 次提交
  13. 02 8月, 2014 2 次提交
  14. 29 7月, 2014 1 次提交
  15. 18 7月, 2014 1 次提交
  16. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  17. 07 6月, 2014 2 次提交
  18. 04 6月, 2014 1 次提交
  19. 31 5月, 2014 1 次提交
  20. 26 5月, 2014 1 次提交
  21. 23 5月, 2014 1 次提交
  22. 07 5月, 2014 2 次提交
    • A
      new methods: ->read_iter() and ->write_iter() · 293bc982
      Al Viro 提交于
      Beginning to introduce those.  Just the callers for now, and it's
      clumsier than it'll eventually become; once we finish converting
      aio_read and aio_write instances, the things will get nicer.
      
      For now, these guys are in parallel to ->aio_read() and ->aio_write();
      they take iocb and iov_iter, with everything in iov_iter already
      validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
      in iov_iter.
      
      Main concerns in that series are stack footprint and ability to
      split the damn thing cleanly.
      
      [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      293bc982
    • A
      pass iov_iter to ->direct_IO() · d8d3d94b
      Al Viro 提交于
      unmodified, for now
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d8d3d94b
  23. 05 5月, 2014 1 次提交
  24. 08 4月, 2014 3 次提交
    • F
      affs: add mount option to avoid filename truncates · 8ca57722
      Fabian Frederick 提交于
      Normal behavior for filenames exceeding specific filesystem limits is to
      refuse operation.
      
      AFFS standard name length being only 30 characters against 255 for usual
      Linux filesystems, original implementation does filename truncate by
      default with a define value AFFS_NO_TRUNCATE which can be enabled but
      needs module compilation.
      
      This patch adds 'nofilenametruncate' mount option so that user can
      easily activate that feature and avoid a lot of problems (eg overwrite
      files ...)
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ca57722
    • A
      proc: show mnt_id in /proc/pid/fdinfo · 49d063cb
      Andrey Vagin 提交于
      Currently we don't have a way how to determing from which mount point
      file has been opened.  This information is required for proper dumping
      and restoring file descriptos due to presence of mount namespaces.  It's
      possible, that two file descriptors are opened using the same paths, but
      one fd references mount point from one namespace while the other fd --
      from other namespace.
      
      $ ls -l /proc/1/fd/1
      lrwx------ 1 root root 64 Mar 19 23:54 /proc/1/fd/1 -> /dev/null
      
      $ cat /proc/1/fdinfo/1
      pos:	0
      flags:	0100002
      mnt_id:	16
      
      $ cat /proc/1/mountinfo | grep ^16
      16 32 0:4 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=1013356k,nr_inodes=253339,mode=755
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49d063cb
    • K
      mm: introduce vm_ops->map_pages() · 8c6e50b0
      Kirill A. Shutemov 提交于
      Here's new version of faultaround patchset.  It took a while to tune it
      and collect performance data.
      
      First patch adds new callback ->map_pages to vm_operations_struct.
      
      ->map_pages() is called when VM asks to map easy accessible pages.
      Filesystem should find and map pages associated with offsets from
      "pgoff" till "max_pgoff".  ->map_pages() is called with page table
      locked and must not block.  If it's not possible to reach a page without
      blocking, filesystem should skip it.  Filesystem should use do_set_pte()
      to setup page table entry.  Pointer to entry associated with offset
      "pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
      entries for other offsets should be calculated relative to "pte".
      
      Currently VM use ->map_pages only on read page fault path.  We try to
      map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
      Performance data for different FAULT_AROUND_ORDER is below.
      
      TODO:
       - implement ->map_pages() for shmem/tmpfs;
       - modify get_user_pages() to be able to use ->map_pages() and implement
         mmap(MAP_POPULATE|MAP_NONBLOCK) on top.
      
      =========================================================================
      Tested on 4-socket machine (120 threads) with 128GiB of RAM.
      
      Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
      somewhere between 3 and 5. Let's say 4 :)
      
      Linux build (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
      	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
      Linux rebuild (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
      	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
      Git test suite (make -j60 test)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
      	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413
      
      Two synthetic tests: access every word in file in sequential/random order.
      It doesn't improve much after FAULT_AROUND_ORDER == 4.
      
      Sequential access 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
      	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
       8 threads
      	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
      	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
       32 threads
      	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
      	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
       60 threads
      	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
      	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
       120 threads
      	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
      	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
      Random access 1GiB file
       1 thread
      	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
      	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
       8 threads
      	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
      	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
       32 threads
      	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
      	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
       60 threads
      	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
      	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
       120 threads
      	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
      	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594
      
      Touch only one page in page table in 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
      	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
       8 threads
      	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
      	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
       32 threads
      	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
      	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
       60 threads
      	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
      	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
       120 threads
      	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
      	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125
      
      This patch (of 2):
      
      Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
      accessible pages around fault address.
      
      On read page fault, if filesystem provides ->map_pages(), we try to map up
      to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
      number of minor page faults.
      
      We call ->map_pages first and use ->fault() as fallback if page by the
      offset is not ready to be mapped (cold page cache or something).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ning Qu <quning@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c6e50b0
  25. 07 4月, 2014 1 次提交
    • J
      f2fs: introduce f2fs_issue_flush to avoid redundant flush issue · 6b4afdd7
      Jaegeuk Kim 提交于
      Some storage devices show relatively high latencies to complete cache_flush
      commands, even though their normal IO speed is prettry much high. In such
      the case, it needs to merge cache_flush commands as much as possible to avoid
      issuing them redundantly.
      So, this patch introduces a mount option, "-o flush_merge", to mitigate such
      the overhead.
      
      If this option is enabled by user, F2FS merges the cache_flush commands and then
      issues just one cache_flush on behalf of them. Once the single command is
      finished, F2FS sends a completion signal to all the pending threads.
      
      Note that, this option can be used under a workload consisting of very intensive
      concurrent fsync calls, while the storage handles cache_flush commands slowly.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6b4afdd7
  26. 04 4月, 2014 3 次提交