1. 09 9月, 2015 4 次提交
  2. 29 7月, 2015 1 次提交
    • D
      xfs: call dax_fault on read page faults for DAX · b2442c5a
      Dave Chinner 提交于
      When modifying the patch series to handle the XFS MMAP_LOCK nesting
      of page faults, I botched the conversion of the read page fault
      path, and so it is only every calling through the page cache. Re-add
      the necessary __dax_fault() call for such files.
      
      Because the get_blocks callback on read faults may not set up the
      mapping buffer correctly to allow unwritten extent completion to be
      run, we need to allow callers of __dax_fault() to pass a null
      complete_unwritten() callback. The DAX code always zeros the
      unwritten page when it is read faulted so there are no stale data
      exposure issues with not doing the conversion. The only downside
      will be the potential for increased CPU overhead on repeated read
      faults of the same page. If this proves to be a problem, then the
      filesystem needs to fix it's get_block callback and provide a
      convert_unwritten() callback to the read fault path.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b2442c5a
  3. 05 7月, 2015 2 次提交
  4. 04 6月, 2015 2 次提交
    • D
      dax: expose __dax_fault for filesystems with locking constraints · ce5c5d55
      Dave Chinner 提交于
      Some filesystems cannot call dax_fault() directly because they have
      different locking and/or allocation constraints in the page fault IO
      path. To handle this, we need to follow the same model as the
      generic block_page_mkwrite code, where the internals are exposed via
      __block_page_mkwrite() so that filesystems can wrap the correct
      locking and operations around the outside. 
      
      This is loosely based on a patch originally from Matthew Willcox.
      Unlike the original patch, it does not change ext4 code, error
      returns or unwritten extent conversion handling.  It also adds a
      __dax_mkwrite() wrapper for .page_mkwrite implementations to do the
      right thing, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ce5c5d55
    • D
      dax: don't abuse get_block mapping for endio callbacks · e842f290
      Dave Chinner 提交于
      dax_fault() currently relies on the get_block callback to attach an
      io completion callback to the mapping buffer head so that it can
      run unwritten extent conversion after zeroing allocated blocks.
      
      Instead of this hack, pass the conversion callback directly into
      dax_fault() similar to the get_block callback. When the filesystem
      allocates unwritten extents, it will set the buffer_unwritten()
      flag, and hence the dax_fault code can call the completion function
      in the contexts where it is necessary without overloading the
      mapping buffer head.
      
      Note: The changes to ext4 to use this interface are suspect at best.
      In fact, the way ext4 did this end_io assignment in the first place
      looks suspect because it only set a completion callback when there
      wasn't already some other write() call taking place on the same
      inode. The ext4 end_io code looks rather intricate and fragile with
      all it's reference counting and passing to different contexts for
      modification via inode private pointers that aren't protected by
      locks...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e842f290
  5. 25 4月, 2015 1 次提交
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  6. 16 4月, 2015 1 次提交
  7. 12 4月, 2015 1 次提交
  8. 17 2月, 2015 5 次提交