1. 23 2月, 2013 1 次提交
  2. 22 2月, 2013 1 次提交
    • D
      mm: only enforce stable page writes if the backing device requires it · 1d1d1a76
      Darrick J. Wong 提交于
      Create a helper function to check if a backing device requires stable
      page writes and, if so, performs the necessary wait.  Then, make it so
      that all points in the memory manager that handle making pages writable
      use the helper function.  This should provide stable page write support
      to most filesystems, while eliminating unnecessary waiting for devices
      that don't require the feature.
      
      Before this patchset, all filesystems would block, regardless of whether
      or not it was necessary.  ext3 would wait, but still generate occasional
      checksum errors.  The network filesystems were left to do their own
      thing, so they'd wait too.
      
      After this patchset, all the disk filesystems except ext3 and btrfs will
      wait only if the hardware requires it.  ext3 (if necessary) snapshots
      pages instead of blocking, and btrfs provides its own bdi so the mm will
      never wait.  Network filesystems haven't been touched, so either they
      provide their own stable page guarantees or they don't block at all.
      The blocking behavior is back to what it was before 3.0 if you don't
      have a disk requiring stable page writes.
      
      Here's the result of using dbench to test latency on ext2:
      
      3.8.0-rc3:
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       WriteX        109347     0.028    59.817
       ReadX         347180     0.004     3.391
       Flush          15514    29.828   287.283
      
      Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms
      
      3.8.0-rc3 + patches:
       WriteX        105556     0.029     4.273
       ReadX         335004     0.005     4.112
       Flush          14982    30.540   298.634
      
      Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms
      
      As you can see, the maximum write latency drops considerably with this
      patch enabled.  The other filesystems (ext3/ext4/xfs/btrfs) behave
      similarly, but see the cover letter for those results.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d1d1a76
  3. 21 12月, 2012 1 次提交
  4. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov 提交于
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
  5. 01 10月, 2012 1 次提交
    • T
      ext4: fix mtime update in nodelalloc mode · 041bbb6d
      Theodore Ts'o 提交于
      Commits 5e8830dc and 41c4d25f introduced a regression into
      v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not
      take place for files modified via mmap if the page was already in the
      page cache.  This would also affect ext3 file systems mounted using
      the ext4 file system driver.
      
      The problem was that ext4_page_mkwrite() had a shortcut which would
      avoid calling __block_page_mkwrite() under some circumstances, and the
      above two commit transferred the responsibility of calling
      file_update_time() to __block_page_mkwrite --- which woudln't get
      called in some circumstances.
      
      Since __block_page_mkwrite() only has three callers,
      block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the
      best way to solve this is to move the responsibility for calling
      file_update_time() to its caller.
      
      This problem was found via xfstests #215 with a file system mounted
      with -o nodelalloc.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: stable@vger.kernel.org
      041bbb6d
  6. 31 7月, 2012 1 次提交
    • J
      nilfs2: Convert to new freezing mechanism · 2c22b337
      Jan Kara 提交于
      We change nilfs_page_mkwrite() to provide proper freeze protection for
      writeable page faults (we must wait for frozen filesystem even if the
      page is fully mapped).
      
      We remove all vfs_check_frozen() checks since they are now handled by
      the generic code.
      
      CC: linux-nilfs@vger.kernel.org
      CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2c22b337
  7. 01 6月, 2012 1 次提交
  8. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  9. 10 5月, 2011 1 次提交
    • R
      nilfs2: get rid of private page allocator · 1cb2d38c
      Ryusuke Konishi 提交于
      Previously, nilfs was cloning pages for mmapped region to freeze their
      data and ensure consistency of checksum during writeback cycles.  A
      private page allocator was used for this page cloning.  But, we no
      longer need to do that since clear_page_dirty_for_io function sets up
      pte so that vm_ops->page_mkwrite function is called right before the
      mmapped pages are modified and nilfs_page_mkwrite function can safely
      wait for the pages to be written back to disk.
      
      So, this stops making a copy of mmapped pages during writeback, and
      eliminates the private page allocation and deallocation functions from
      nilfs.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      1cb2d38c
  10. 30 3月, 2011 1 次提交
    • R
      nilfs2: fix data loss in mmap page write for hole blocks · 34094537
      Ryusuke Konishi 提交于
      From the result of a function test of mmap, mmap write to shared pages
      turned out to be broken for hole blocks.  It doesn't write out filled
      blocks and the data will be lost after umount.  This is due to a bug
      that the target file is not queued for log writer when filling hole
      blocks.
      
      Also, nilfs_page_mkwrite function exits normal code path even after
      successfully filled hole blocks due to a change of block_page_mkwrite
      function; just after nilfs was merged into the mainline,
      block_page_mkwrite() started to return VM_FAULT_LOCKED instead of zero
      by the patch "mm: close page_mkwrite races" (commit:
      b827e496).  The current nilfs_page_mkwrite() is not handling
      this value properly.
      
      This corrects nilfs_page_mkwrite() and will resolve the data loss
      problem in mmap write.
      
      [This should be applied to every kernel since 2.6.30 but a fix is
       needed for 2.6.37 and prior kernels]
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: stable <stable@kernel.org>  [2.6.38]
      34094537
  11. 09 3月, 2011 1 次提交
    • R
      nilfs2: get rid of nilfs_sb_info structure · e3154e97
      Ryusuke Konishi 提交于
      This directly uses sb->s_fs_info to keep a nilfs filesystem object and
      fully removes the intermediate nilfs_sb_info structure.  With this
      change, the hierarchy of on-memory structures of nilfs will be
      simplified as follows:
      
      Before:
        super_block
             -> nilfs_sb_info
                   -> the_nilfs
                         -> cptree --+-> nilfs_root (current file system)
                                     +-> nilfs_root (snapshot A)
                                     +-> nilfs_root (snapshot B)
                                     :
                   -> nilfs_sc_info (log writer structure)
      After:
        super_block
             -> the_nilfs
                   -> cptree --+-> nilfs_root (current file system)
                               +-> nilfs_root (snapshot A)
                               +-> nilfs_root (snapshot B)
                               :
                   -> nilfs_sc_info (log writer structure)
      
      The reason why we didn't design so from the beginning is because the
      initial shape also differed from the above.  The early hierachy was
      composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
      shared nilfs object.  On the kernel 2.6.37, it was changed to the
      current shape in order to unify super block instances into one per
      device, and this cleanup became applicable as the result.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      e3154e97
  12. 08 3月, 2011 1 次提交
  13. 10 1月, 2011 1 次提交
    • R
      nilfs2: fiemap support · 622daaff
      Ryusuke Konishi 提交于
      This adds fiemap to nilfs.  Two new functions, nilfs_fiemap and
      nilfs_find_uncommitted_extent are added.
      
      nilfs_fiemap() implements the fiemap inode operation, and
      nilfs_find_uncommitted_extent() helps to get a range of data blocks
      whose physical location has not been determined.
      
      nilfs_fiemap() collects extent information by looping through
      nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      622daaff
  14. 28 5月, 2010 1 次提交
  15. 02 10月, 2009 1 次提交
  16. 28 9月, 2009 1 次提交
  17. 22 9月, 2009 1 次提交
  18. 07 4月, 2009 5 次提交