1. 19 10月, 2017 1 次提交
  2. 07 9月, 2017 4 次提交
    • J
      mm: remove nr_pages argument from pagevec_lookup{,_range}() · 397162ff
      Jan Kara 提交于
      All users of pagevec_lookup() and pagevec_lookup_range() now pass
      PAGEVEC_SIZE as a desired number of pages.
      
      Just drop the argument.
      
      Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      397162ff
    • J
      ext4: use pagevec_lookup_range() in ext4_find_unwritten_pgoff() · dec0da7b
      Jan Kara 提交于
      Use pagevec_lookup_range() in ext4_find_unwritten_pgoff() since we are
      interested only in pages in the given range.  Simplify the logic as a
      result of not getting pages out of range and index getting automatically
      advanced.
      
      Link: http://lkml.kernel.org/r/20170726114704.7626-6-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dec0da7b
    • J
      mm: make pagevec_lookup() update index · d72dc8a2
      Jan Kara 提交于
      Make pagevec_lookup() (and underlying find_get_pages()) update index to
      the next page where iteration should continue.  Most callers want this
      and also pagevec_lookup_tag() already does this.
      
      Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d72dc8a2
    • R
      dax: use common 4k zero page for dax mmap reads · 91d25ba8
      Ross Zwisler 提交于
      When servicing mmap() reads from file holes the current DAX code
      allocates a page cache page of all zeroes and places the struct page
      pointer in the mapping->page_tree radix tree.
      
      This has three major drawbacks:
      
      1) It consumes memory unnecessarily. For every 4k page that is read via
         a DAX mmap() over a hole, we allocate a new page cache page. This
         means that if you read 1GiB worth of pages, you end up using 1GiB of
         zeroed memory. This is easily visible by looking at the overall
         memory consumption of the system or by looking at /proc/[pid]/smaps:
      
      	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:             1048576 kB
      	Pss:             1048576 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:   1048576 kB
      	Private_Dirty:         0 kB
      	Referenced:      1048576 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      2) It is slower than using a common zero page because each page fault
         has more work to do. Instead of just inserting a common zero page we
         have to allocate a page cache page, zero it, and then insert it. Here
         are the average latencies of dax_load_hole() as measured by ftrace on
         a random test box:
      
          Old method, using zeroed page cache pages:	3.4 us
          New method, using the common 4k zero page:	0.8 us
      
         This was the average latency over 1 GiB of sequential reads done by
         this simple fio script:
      
           [global]
           size=1G
           filename=/root/dax/data
           fallocate=none
           [io]
           rw=read
           ioengine=mmap
      
      3) The fact that we had to check for both DAX exceptional entries and
         for page cache pages in the radix tree made the DAX code more
         complex.
      
      Solve these issues by following the lead of the DAX PMD code and using a
      common 4k zero page instead.  As with the PMD code we will now insert a
      DAX exceptional entry into the radix tree instead of a struct page
      pointer which allows us to remove all the special casing in the DAX
      code.
      
      Note that we do still pretty aggressively check for regular pages in the
      DAX radix tree, especially where we take action based on the bits set in
      the page.  If we ever find a regular page in our radix tree now that
      most likely means that someone besides DAX is inserting pages (which has
      happened lots of times in the past), and we want to find that out early
      and fail loudly.
      
      This solution also removes the extra memory consumption.  Here is that
      same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
      code:
      
      	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:                   0 kB
      	Pss:                   0 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:         0 kB
      	Private_Dirty:         0 kB
      	Referenced:            0 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      Overall system memory consumption is similarly improved.
      
      Another major change is that we remove dax_pfn_mkwrite() from our fault
      flow, and instead rely on the page fault itself to make the PTE dirty
      and writeable.  The following description from the patch adding the
      vm_insert_mixed_mkwrite() call explains this a little more:
      
         "To be able to use the common 4k zero page in DAX we need to have our
          PTE fault path look more like our PMD fault path where a PTE entry
          can be marked as dirty and writeable as it is first inserted rather
          than waiting for a follow-up dax_pfn_mkwrite() =>
          finish_mkwrite_fault() call.
      
          Right now we can rely on having a dax_pfn_mkwrite() call because we
          can distinguish between these two cases in do_wp_page():
      
                  case 1: 4k zero page => writable DAX storage
                  case 2: read-only DAX storage => writeable DAX storage
      
          This distinction is made by via vm_normal_page(). vm_normal_page()
          returns false for the common 4k zero page, though, just as it does
          for DAX ptes. Instead of special casing the DAX + 4k zero page case
          we will simplify our DAX PTE page fault sequence so that it matches
          our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
          We will instead use dax_iomap_fault() to handle write-protection
          faults.
      
          This means that insert_pfn() needs to follow the lead of
          insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
          'mkwrite' is set insert_pfn() will do the work that was previously
          done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91d25ba8
  3. 05 9月, 2017 1 次提交
  4. 25 8月, 2017 2 次提交
  5. 06 8月, 2017 1 次提交
    • J
      ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize · fcf5ea10
      Jan Kara 提交于
      ext4_find_unwritten_pgoff() does not properly handle a situation when
      starting index is in the middle of a page and blocksize < pagesize. The
      following command shows the bug on filesystem with 1k blocksize:
      
        xfs_io -f -c "falloc 0 4k" \
                  -c "pwrite 1k 1k" \
                  -c "pwrite 3k 1k" \
                  -c "seek -a -r 0" foo
      
      In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
      SEEK_DATA) will return the correct result.
      
      Fix the problem by neglecting buffers in a page before starting offset.
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org # 3.8+
      fcf5ea10
  6. 17 7月, 2017 1 次提交
    • D
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells 提交于
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc98a42c
  7. 24 6月, 2017 1 次提交
  8. 20 6月, 2017 1 次提交
  9. 25 5月, 2017 1 次提交
    • E
      ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff() · 624327f8
      Eryu Guan 提交于
      ext4_find_unwritten_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size ext4 on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/ext4/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is unconvered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      624327f8
  10. 22 5月, 2017 2 次提交
    • J
      ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff() · 3f1d5bad
      Jan Kara 提交于
      There is an off-by-one error in loop termination conditions in
      ext4_find_unwritten_pgoff() since 'end' may index a page beyond end of
      desired range if 'endoff' is page aligned. It doesn't have any visible
      effects but still it is good to fix it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3f1d5bad
    • J
      ext4: fix SEEK_HOLE · 7d95eddf
      Jan Kara 提交于
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      CC: stable@vger.kernel.org
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7d95eddf
  11. 13 5月, 2017 1 次提交
  12. 03 4月, 2017 1 次提交
    • D
      ext4: Add statx support · 99652ea5
      David Howells 提交于
      Return enhanced file attributes from the Ext4 filesystem.  This includes
      the following:
      
       (1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.
      
       (2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.
      
      This requires that all ext4 inodes have a getattr call, not just some of
      them, so to this end, split the ext4_getattr() function and only call part
      of it where appropriate.
      
      Example output:
      
      	[root@andromeda ~]# touch foo
      	[root@andromeda ~]# chattr +ai foo
      	[root@andromeda ~]# /tmp/test-statx foo
      	statx(foo) = 0
      	results=fff
      	  Size: 0               Blocks: 0          IO Block: 4096    regular file
      	Device: 08:12           Inode: 2101950     Links: 1
      	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
      	Access: 2016-02-11 17:08:29.031795451+0000
      	Modify: 2016-02-11 17:08:29.031795451+0000
      	Change: 2016-02-11 17:11:11.987790114+0000
      	 Birth: 2016-02-11 17:08:29.031795451+0000
      	Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99652ea5
  13. 25 2月, 2017 3 次提交
  14. 23 2月, 2017 2 次提交
  15. 09 2月, 2017 1 次提交
  16. 05 2月, 2017 1 次提交
  17. 27 12月, 2016 1 次提交
  18. 21 11月, 2016 4 次提交
  19. 08 10月, 2016 2 次提交
  20. 30 9月, 2016 2 次提交
  21. 15 9月, 2016 1 次提交
  22. 27 7月, 2016 1 次提交
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  23. 11 7月, 2016 1 次提交
  24. 17 5月, 2016 1 次提交
  25. 13 5月, 2016 1 次提交
    • J
      ext4: pre-zero allocated blocks for DAX IO · 12735f88
      Jan Kara 提交于
      Currently ext4 treats DAX IO the same way as direct IO. I.e., it
      allocates unwritten extents before IO is done and converts unwritten
      extents afterwards. However this way DAX IO can race with page fault to
      the same area:
      
      ext4_ext_direct_IO()				dax_fault()
        dax_io()
          get_block() - allocates unwritten extent
          copy_from_iter_pmem()
      						  get_block() - converts
      						    unwritten block to
      						    written and zeroes it
      						    out
        ext4_convert_unwritten_extents()
      
      So data written with DAX IO gets lost. Similarly dax_new_buf() called
      from dax_io() can overwrite data that has been already written to the
      block via mmap.
      
      Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
      use them for DAX mmap. The downside of this solution is that every
      allocating write writes each block twice (once zeros, once data). Fixing
      the race with locking is possible as well however we would need to
      lock-out faults for the whole range written to by DAX IO. And that is
      not easy to do without locking-out faults for the whole file which seems
      too aggressive.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      12735f88
  26. 02 5月, 2016 2 次提交